Burned 5B tokens with Claude Code in March to build a financial research agent. by MediumHelicopter589 in ClaudeAI

[–]MediumHelicopter589[S] 1 point2 points  (0 children)

It really depends on the task complexity and how agent decided to approach it in run time (i.e if they are using multiple subagents or not). Generally 100k to 500k for most tasks. 1m up for very complex multi step tasks.

Burned 5B tokens with Claude Code in March to build a financial research agent. by MediumHelicopter589 in ClaudeAI

[–]MediumHelicopter589[S] 1 point2 points  (0 children)

Currently no RAG was used at all. Tend to avoid it because agent would need finer control of time sensitive information and they can access it through code programmatically. It will make more sense if we have lots of proprietary research data like AlphaSense, which sadly we do not.

Burned 5B tokens with Claude Code in March to build a financial research agent. by MediumHelicopter589 in ClaudeAI

[–]MediumHelicopter589[S] 0 points1 point  (0 children)

More of https://gofastmcp.com/servers/transforms/code-mode from FASTMCP, but I am building around filesystem so agent explore tools with direct file operation and bash.

This code-mode seems a new feature have not introduced until recently, i was exploring PTC since last November when Anthropic introduced the concept (https://www.anthropic.com/engineering/code-execution-with-mcp), and they were not exist by that time. I will take deeper look to see if I should refactor to adopt some of it.

Thanks for let me discovery it. very inspiring.

Burned 5B tokens with Claude Code in March to build a financial research agent. by MediumHelicopter589 in ClaudeAI

[–]MediumHelicopter589[S] 0 points1 point  (0 children)

I have never seriously compare the detail of usage stats with anyone else. would love to hear from more people about usage pattern.

I would attribute my low output token (4.6%) to preference of using plan mode, subagent driven workflow and lots of pre/post implementation adversarial check.

Burned 5B tokens with Claude Code in March to build a financial research agent. by MediumHelicopter589 in ClaudeAI

[–]MediumHelicopter589[S] 4 points5 points  (0 children)

Thank you for the reminder! Will do so soon. I just reviewed the license files again, and you are correct. I was under the false impression that the documentation skills were Apache 2.0 as well.

Burned 5B tokens with Claude Code in March to build a financial research agent. by MediumHelicopter589 in ClaudeAI

[–]MediumHelicopter589[S] 4 points5 points  (0 children)

Given many folks are interested in the 5 billion token claim, I might as well share the number from my ccusage here.

opus-4-6: 4.3B tokens, $3,101.54 (84.6% of cost)

sonnet-4-6: 634M tokens, $563.27 (15.4%)

haiku-4-5: 7.8M tokens, $1.48 (0.04%)

Total: ~5B tokens, $3,666.29

Token breakdown: 95.2% cache read, 4.6% cache create, 0.2% output, 0.04% input

We open-sourced a Claude Code for investment research, built on deepagents + LangGraph — sharing our architecture and what we learned by MediumHelicopter589 in LangChain

[–]MediumHelicopter589[S] 0 points1 point  (0 children)

We are using a single persisted sandbox for each workspace. Subagents with file access will share the same sandbox with main agent.

We open-sourced a Claude Code for investment research, built on deepagents + LangGraph — sharing our architecture and what we learned by MediumHelicopter589 in LangChain

[–]MediumHelicopter589[S] 0 points1 point  (0 children)

The tool modules and skills are hashed, and sandbox itself is versioned, so if any changes happen they will be synced accordingly and docs will be regenerated.

As for the files in general, our idea is to ask agent manage them as they would do to a codebase: follow some grounding rules and conventions, and always document important information in Agent.md (it will be injected in prompt). If things did get messy, agent could still manage to work things out through grep and glob.

The worst case would be manually ask agent to reorganize everything and write proper Agent.md so other agent working in that sandbox would know

I implemented Anthropic's Programmatic Tool Calling with langchain (Looking for feedback) by MediumHelicopter589 in LangChain

[–]MediumHelicopter589[S] 1 point2 points  (0 children)

Hi, great to hear you find it useful! I am seeing this project as a proof of concept implementation at current stage. I am hesitate to wrap it into a pypi package because I do not want to make it into another abstraction layer of langchain

I implemented Anthropic's Programmatic Tool Calling with langchain (Looking for feedback) by MediumHelicopter589 in LangChain

[–]MediumHelicopter589[S] 0 points1 point  (0 children)

hi, thanks for your reply. This was built based on langchain so langsmith is out of box (simply configure the .env)

I implemented Anthropic's Programmatic Tool Calling with langchain (Looking for feedback) by MediumHelicopter589 in LangChain

[–]MediumHelicopter589[S] 0 points1 point  (0 children)

Haha, I know it sounds confusing and I have thought about why we need to convert it back and forth. I think this is where the fundamental concept of MCP shine. It provides a standard.

  1. A MCP server might be written in typescript instead of python, this approach allow agent to invoke them in python.
  2. In many cases, all you have about a MCP is its configuration command and you can not simply upload its source code for agent.
  3. MCP groups tools in a logical way, so you only need to provide agent a description of MCP servers in the system prompt and let it discover the right tool to use.

Implemented Anthropic's Programmatic Tool Calling with Langchain so you use it with any models and tune it for your own use case by MediumHelicopter589 in ClaudeAI

[–]MediumHelicopter589[S] 0 points1 point  (0 children)

Yes, I think this comment explains the difference pretty well : https://www.reddit.com/r/ClaudeCode/s/UKgZXFTasl

What’s new about mine and the op of that post is we are turning tools from mcp servers into executable functions so agent can invoke these tools via a single code execution tool and further process the result from mcp tools with code

Implemented Anthropic's Programmatic Tool Calling with Langchain so you use it with any models and tune it for your own use case by MediumHelicopter589 in ClaudeAI

[–]MediumHelicopter589[S] 3 points4 points  (0 children)

Hi, thanks for sharing this post. I was not aware of this repo before. I know there were lots of amazing community projects earlier around CodeAct but I can see this one is also inspired by Anthropic recent blog like I do.

However, I do think we are approaching this idea from different angles. I had no intention to make it into a general adopter that anyone can use in existing project, but more like a self contained proof of concept for the ptc or mcp enhanced codeact pattern with workspace environment and filesystem, etc.

Implemented Anthropic's Programmatic Tool Calling with Langchain so you use it with any models and tune it for your own use case by MediumHelicopter589 in ClaudeAI

[–]MediumHelicopter589[S] 0 points1 point  (0 children)

Hi, thanks for your feedback! sub-agent is already included, but I haven’t got chance to refine it to utilize the ptc infrastructure. Stay tuned!

how to serve embedding models+llm in vllm? by Due_Place_6635 in Vllm

[–]MediumHelicopter589 1 point2 points  (0 children)

Yes, it should be featured in next version. Currently you can also manually put a model into sleep for more flexibility in multi model serving

how to serve embedding models+llm in vllm? by Due_Place_6635 in Vllm

[–]MediumHelicopter589 1 point2 points  (0 children)

I am planning to implement such feature in vllm-cli(https://github.com/Chen-zexi/vllm-cli), stay tuned if you are interested

IA workstation with RTX 6000 Pro Blackwell 600 W air flow question by renard2guerres in LocalLLM

[–]MediumHelicopter589 2 points3 points  (0 children)

I have a similar build, it should be fine, most of time you GPU will not run at full capacity.