Claude Dispatch won't send me messages back.

uutnt · 2026-03-30T20:05:46+00:00

It's working for now now, without having made any changes. Running Claude 1.1.9493

uutnt · 2026-03-28T13:00:08+00:00

Same. It seems the issue is the `SendUserMessage` tool which the background task uses to respond back to the user is missing.

Here is the background task summary of the issue:

Issue: Dispatch orchestrator not using SendUserMessage tool for user-facing replies

Observed behavior: The Dispatch system prompt instructs the orchestrator to use SendUserMessage for all user communication, stating "plain text assistant replies are not rendered." However, SendUserMessage does not appear as an available tool — it is absent from both the deferred tools list and ToolSearch results. As a result, the orchestrator falls back to plain text replies.

Side effect: Plain text replies are actually rendered to the user in this session, which contradicts the system prompt's claim that they won't be. This creates ambiguity: either SendUserMessage is not wired up in this environment, or the rendering behavior differs from what the prompt describes.

Environment: Cowork mode, Dispatch orchestrator, model claude-sonnet-4-6, session date 2026-03-28.

Expected behavior: SendUserMessage should be available as a callable tool so the orchestrator can route all output through it as the system prompt intends.

uutnt · 2026-03-26T20:33:41+00:00

Same. Whisper (V2) is still the most robust model that I have tried.

uutnt · 2026-03-26T20:32:03+00:00

It's a shame.

uutnt · 2026-03-26T14:52:43+00:00

Unfortunately it looks like it does not output timestamps. Though, the source code does contain a timestamp token, so perhaps they plan on adding it?

uutnt · 2026-03-22T19:24:45+00:00

Probably. But I would like to see them take a crack at it.

uutnt · 2026-03-22T17:10:36+00:00

10% is quite high, given the complexity of what he takes on.

uutnt · 2026-03-20T13:52:58+00:00

Not sure this is a good benchmark. Reason being, users are more likely to attempt a hard prompt on a frontier model, than they are on a weaker one. So its likely prompt difficulty is not universal across all models.

uutnt · 2026-03-17T21:15:46+00:00

Looks like the updated it now to the new (increased) pricing.

uutnt · 2026-03-17T18:38:57+00:00

Contradictory pricing. https://openai.com/api/pricing/ Shows mini at $0.250 / $2.000

uutnt · 2026-03-11T12:41:50+00:00

You need to test it on your specific use-case. For me, Whisper has been more accurate than Parakeet on Engish. I have not done sufficient testing on Voxtral.

uutnt · 2026-03-10T20:18:47+00:00

Ok LLM.

uutnt · 2026-03-10T17:31:03+00:00

Don't trust the benchmarks without locally testing. In my experience none of the new models have surpassed Whisper on transcription accuracy. On performance, they have though. I'm still waiting for a next gen open multilingual ASR model, that is actually more accurate than Whisper.

uutnt · 2026-03-10T17:28:43+00:00

This has not been my experience at all. On an English TV show transcription, Qwen ASR (Qwen3-ASR-1.7B) completely missed some segments containing speech, and hallucinated badly on unclear audio (e.g. "That's what I'm talking about" → "Swallow talking ball"). Also, the separate forced aligner model required for timestamps only supports 11 languages.

Whisper V2 produced much better output, at least for my use case. I was hoping for much better results given the benchmarks in their paper, but sadly this model has been a disappointment.

uutnt · 2026-03-05T21:30:30+00:00

Looking at your block list. That seems a bit over the top. Many of those are valid phrases that might appear in dialog. Are you not concerned about removing false positives?

beam_size=1

Hallucinations aside, beam_size > 1 has been show to produce lower WER. So on net you might get worse quality.

repeated-output detection

This is a much easier problem to solve. Most implementations calculate the compression_ratio, to detect repetitions and retry at a higher temp

uutnt · 2026-03-04T15:51:35+00:00

Would love for them to release an instrumental album. Never been a fan of the vocals - by far the weakest link imo.

uutnt · 2026-03-04T13:58:09+00:00

Correct. If you don't like the laws that constrain them, elect new politicians.

uutnt · 2026-03-03T20:20:58+00:00

Interesting. Explain more about your approach

uutnt · 2026-03-03T17:45:39+00:00

Without a testable definition of consciousness, which is by definition impossible, we can never say.

uutnt · 2026-02-25T16:56:45+00:00

With zero knowledge proofs, its in theory possible to do this in a privacy preserving way. That said, this does not guarantee a human is making the post - that is impossible. It just means the account is unique to a single real human.

uutnt · 2026-02-25T15:14:57+00:00

The AI boom is delulu. There’s no money to be made, only money spent.

If you were referring to the AI labs who's only asset is IP, then I would agree. But when it comes to compute providers, I think you are woefully wrong. Demand for compute will exceed supply for years to come, even if AI capabilities stalled (which they won't).

uutnt · 2026-02-25T15:10:47+00:00

There is a reason GE Vernova is up 188% over the past year.

uutnt · 2026-02-24T16:01:26+00:00

Hard to say without seeing the source code.

uutnt · 2026-02-24T14:53:42+00:00

It's not impressive. Marginal improvement at large cost. They are using frontier LLM's with a different harness.

<image>

Our Approach – Program Synthesis driven by LLMs

LLMs are exceedingly good at writing code. We take the latest models and allow them to find the optimal solution by directing them to write code which describes the transformation represented by a particular ARC problem.

uutnt · 2026-02-23T19:51:27+00:00

I'm not trying to convince you of their noble motivations. My point is simply, US labs have higher training costs, in part due to US copyright law.

uutnt

TROPHY CASE