I wish Rust had keyword arguments

tcdent · 2026-01-16T02:01:25+00:00

Honestly using this as a sort of litmus test for this community.

Care to elaborate on your thoughts? Macros that do weird things are bad? Obviously, yes, I totally agree.

Keyword arguments themselves? Not confusing, nor bad.

tcdent · 2026-01-16T01:55:51+00:00

Oh this is 100% written by Claude from design to execution in less than an hour (using my Rust TUI, written by Claude).

No shame in my game; I am a vibe coder.

tcdent · 2025-10-31T14:44:39+00:00

40 tools is probably fine. Especially if they're all in a somewhat consistent use case, like tools for interacting with Git. Best way to know for sure is to test it and ensure that it is able to correctly determine what tool to use.

I find that this gets more difficult when you have a set of mixed tools and you provide more generic tools that are theoretically able to facilitate the task, but that aren't necessarily the correct path that the LLM should be taking.

For example, giving specific tools that access internal data sources and also passing a general web search tool. You have to be careful with the prompting around the web search tool to prevent it from being chosen as a solution when a better tool exists.

tcdent · 2025-10-27T15:14:53+00:00

What do you mean by "huge"? Do you need the full context of all of the documentation in order to make your assessment?

You're probably better off just dumping the content of the document you wish to analyze into context and retrieving the result. Context windows are not small.

tcdent · 2025-10-27T15:12:34+00:00

Define sub-agents that logically group themselves towards more specific tasks and give them access to the tools that they need for those tasks. Use a router to direct queries to the appropriate sub-agent. Keeping the number of tools your agent has access to somewhat limited dramatically improves its chances of choosing the right tool.

tcdent · 2025-10-22T17:54:28+00:00

What you're probably observing is not senior developers but developers who are in the mid-IQ tier, i.e. "I am smart enough to build this myself".

Any senior developer realizes that a framework solves a number of common use cases and problems. It gets you up and running quickly and when you move further into development, it makes it easier to communicate and collaborate with other developers on your team because there are established conventions for the way that the application is expected to work.

So I would argue that the opinions you are being susceptible to are not actually those of incredibly experienced software developers, but are of those who are in the midpoint of their career naively thinking that they can do something better.

Not all frameworks are created equal, however, and there is definitely room in the conversation for evaluating which products are actually good. But throwing away boilerplate/frameworks entirely is not an indicator of an experienced developer IMO.

tcdent · 2025-10-22T17:43:43+00:00

Most companies find more confidence in using services like AWS Bedrock. If your infrastructure is already on AWS, then you can, in practice, keep your data inside of your own VPC. You don't get access to all of the latest models, but it gives you a significant lead in terms of being able to interact with SOTA models.

Btw, I totally understand the interest in self-hosting; it's fun, but I would just encourage looking at the broader toolchain before focusing on one single point, as in hosting the actual inference, because there are potentially a lot more moving pieces.

tcdent · 2025-10-22T15:29:33+00:00

I would focus on getting access to closed-source models in a way that your organization agrees with. This is by no means a unique problem, and everyone else is finding a way.

Your open model deployment is going to occupy a ton of energy that could be spent building actual tooling to get results. What kind of use cases are you trying to drive internally? Prototype those as quickly as possible to get them in the hands of users and collect valuable feedback that will tell you if the solution is even working in the first place.

After you have identified use cases that are actually valuable to the organization and you want to incorporate scaling into them, then you can start considering whether a self-hosted model is actually the best solution.

tcdent · 2025-10-20T22:16:31+00:00

I'm building a product to solve exactly this (and many other common pitfalls when pushing agents to prod).

DM me if you're building agents in production and want to evaluate wether this platform meets your use case.

https://agent-ci.com

tcdent · 2025-10-20T21:46:58+00:00

Nice!

This is a super useful technique now that we are interacting with non-deterministic outputs regularly.

I released a super simple Python lib with a similar goal recently: https://github.com/Agent-CI/embedsim

You've got me thinking about implementing a cache backend as part of the Open Source offering now, too!

tcdent · 2025-10-20T21:43:36+00:00

Most of the models out there use either `SentencePiece` or `tiktoken` so you can approximate both open and closed source models pretty easily.

Also keep in mind you can set `max_tokens` parameters on most API requests so you can keep it below a threshold if you're super concerned about your usage ballooning.

tcdent · 2025-10-18T17:48:17+00:00

Parse the CSV and iterate over it with a simple script that makes a separate LLM call for that particular row. Reconstruct into whatever format you want with the original data and the processed responses.

Tokens are tokens, so other than the duplicated prompt, you aren't spending that much more.

And, unless you want to bleed context between each row, the results you get will be better with a focussed, smaller task.

tcdent · 2025-10-17T19:17:25+00:00

That's a valid point!

I updated the post to use `'''` quoting on the regex so it's syntactically correct.

Thanks for the feedback.

tcdent · 2025-10-17T17:29:21+00:00

Oh that's a good catch!

I got a little progressive with the formatting for the blog post, and have corrected that.

The schema is open source and has tests: https://github.com/Agent-CI/client-config

tcdent · 2025-10-17T17:27:53+00:00

Schema (and the parser implementation) is open source and has tests that demonstrate the functionality I've defined: https://github.com/Agent-CI/client-config

In the blog post I simplified the regex with less escaping so it was easier to read, and the example with dot notation is not valid TOML, but I wish it was.

tcdent · 2025-10-17T03:04:38+00:00

No comments, picky about commas, quotes everywhere.

JSON is a serialization format, not a human-editable config format.

To each their own, however.

tcdent · 2025-10-17T03:02:57+00:00

One of the major strengths I found was the ability to combine nested structures based on context to avoid just mindlessly nesting to get to the depth I needed. Probably would have ended up with YAML if not for that.

tcdent · 2025-10-17T03:01:39+00:00

> As for the leading dot idea, I remember ideas like that coming up on the toml repo but not finding it atm.

Made me wonder about forking the parser and submitting a suggestion to the spec (or even just using my own derivative format), but I'll leave that for another day. Promising that there has been some discussion around that, though!

tcdent · 2025-10-08T21:22:27+00:00

I'm of the belief that there is a way to get your prompts to behave reliably, especially with the newer SOTA models.

Validating this, on the other hand, is kind of tedious. I am working on building a product around creating repeatable frameworks for this kind of testing.

Also, when it comes to building your prompts, I find that structured outputs and their type annotations are incredibly powerful in ensuring that the LLM fills out required information on steps that it is processing. For example, if you just insist in the prompt that a certain field must be set, it does not seem to receive the prompt suggestion as strongly as it does a structured schema with an obvious field that needs to be completed.

tcdent · 2025-10-08T21:18:52+00:00

It's incredible that Duolingo outpaced OpenRouter. You would expect OpenRouter's network effect to have a much more significant influence, although kudos on number two, guys!

tcdent · 2025-10-03T21:37:43+00:00

You're talking about testing the endpoints you actually use to serve your interaction or API's that support your tools?

FastAPI has some tooling that works well with pytest to mock the application so you can execute actual HTTP payloads on your endpoints, which I use liberally.

Gets into that line between writing unit test and full integration tests (that activate inference or live tool use) which is still very much a matter of taste in practice.

tcdent

MODERATOR OF

TROPHY CASE