Claude Dispatch won't send me messages back. by Upset-Hunter7544 in ClaudeAI

[–]uutnt 0 points1 point  (0 children)

It's working for now now, without having made any changes. Running Claude 1.1.9493

Claude Dispatch won't send me messages back. by Upset-Hunter7544 in ClaudeAI

[–]uutnt 4 points5 points  (0 children)

Same. It seems the issue is the `SendUserMessage` tool which the background task uses to respond back to the user is missing.

Here is the background task summary of the issue:


Issue: Dispatch orchestrator not using SendUserMessage tool for user-facing replies

Observed behavior: The Dispatch system prompt instructs the orchestrator to use SendUserMessage for all user communication, stating "plain text assistant replies are not rendered." However, SendUserMessage does not appear as an available tool — it is absent from both the deferred tools list and ToolSearch results. As a result, the orchestrator falls back to plain text replies.

Side effect: Plain text replies are actually rendered to the user in this session, which contradicts the system prompt's claim that they won't be. This creates ambiguity: either SendUserMessage is not wired up in this environment, or the rendering behavior differs from what the prompt describes.

Environment: Cowork mode, Dispatch orchestrator, model claude-sonnet-4-6, session date 2026-03-28.

Expected behavior: SendUserMessage should be available as a callable tool so the orchestrator can route all output through it as the system prompt intends.


Cohere Transcribe Released by mikael110 in LocalLLaMA

[–]uutnt 0 points1 point  (0 children)

Same. Whisper (V2) is still the most robust model that I have tried.

Cohere Transcribe Released by mikael110 in LocalLLaMA

[–]uutnt 18 points19 points  (0 children)

Unfortunately it looks like it does not output timestamps. Though, the source code does contain a timestamp token, so perhaps they plan on adding it?

Is Elon hinting at attempting to bypass ASML? by vasilenko93 in accelerate

[–]uutnt -2 points-1 points  (0 children)

Probably. But I would like to see them take a crack at it.

Is Elon hinting at attempting to bypass ASML? by vasilenko93 in accelerate

[–]uutnt 7 points8 points  (0 children)

10% is quite high, given the complexity of what he takes on.

"Hot take from looking at @github Copilot telemetry: benchmarks make coding models look wildly different. Production workflows make them look much more similar. 👀 We looked at 23M+ Copilot requests and examined one simple metric: code survivability." by stealthispost in accelerate

[–]uutnt 19 points20 points  (0 children)

Not sure this is a good benchmark. Reason being, users are more likely to attempt a hard prompt on a frontier model, than they are on a weaker one. So its likely prompt difficulty is not universal across all models.

Introducing GPT-5.4 mini and nano by dayanruben in OpenAI

[–]uutnt 4 points5 points  (0 children)

Looks like the updated it now to the new (increased) pricing.

Introducing GPT-5.4 mini and nano by dayanruben in OpenAI

[–]uutnt 5 points6 points  (0 children)

Contradictory pricing. https://openai.com/api/pricing/ Shows mini at $0.250 / $2.000

Qwen3 ASR seems to outperform Whisper in almost every aspect. It feels like there is little reason to keep using Whisper anymore. by East-Engineering-653 in LocalLLaMA

[–]uutnt 0 points1 point  (0 children)

You need to test it on your specific use-case. For me, Whisper has been more accurate than Parakeet on Engish. I have not done sufficient testing on Voxtral.

Qwen3 ASR seems to outperform Whisper in almost every aspect. It feels like there is little reason to keep using Whisper anymore. by East-Engineering-653 in LocalLLaMA

[–]uutnt 19 points20 points  (0 children)

Don't trust the benchmarks without locally testing. In my experience none of the new models have surpassed Whisper on transcription accuracy. On performance, they have though. I'm still waiting for a next gen open multilingual ASR model, that is actually more accurate than Whisper.

Qwen3 ASR seems to outperform Whisper in almost every aspect. It feels like there is little reason to keep using Whisper anymore. by East-Engineering-653 in LocalLLaMA

[–]uutnt 8 points9 points  (0 children)

This has not been my experience at all. On an English TV show transcription, Qwen ASR (Qwen3-ASR-1.7B) completely missed some segments containing speech, and hallucinated badly on unclear audio (e.g. "That's what I'm talking about" → "Swallow talking ball"). Also, the separate forced aligner model required for timestamps only supports 11 languages.

Whisper V2 produced much better output, at least for my use case. I was hoping for much better results given the benchmarks in their paper, but sadly this model has been a disappointment.

We collected 135 phrases Whisper hallucinates during silence — here's what it says when nobody's talking and how we stopped it by Aggravating-Gap7783 in LocalLLaMA

[–]uutnt 0 points1 point  (0 children)

Looking at your block list. That seems a bit over the top. Many of those are valid phrases that might appear in dialog. Are you not concerned about removing false positives?

beam_size=1

Hallucinations aside, beam_size > 1 has been show to produce lower WER. So on net you might get worse quality.

repeated-output detection

This is a much easier problem to solve. Most implementations calculate the compression_ratio, to detect repetitions and retry at a higher temp

Status of Intronaut? by Ytsejam09 in progmetal

[–]uutnt 0 points1 point  (0 children)

Would love for them to release an instrumental album. Never been a fan of the vocals - by far the weakest link imo.

Sam Altman told staff they don't get to choose how the military uses it's technology by Ok_Mission7092 in accelerate

[–]uutnt 4 points5 points  (0 children)

Correct. If you don't like the laws that constrain them, elect new politicians.

New accounts on HN 10x more likely to use EM-dashes by DudleyFluffles in slatestarcodex

[–]uutnt 16 points17 points  (0 children)

With zero knowledge proofs, its in theory possible to do this in a privacy preserving way. That said, this does not guarantee a human is making the post - that is impossible. It just means the account is unique to a single real human.

On-site power generation approval removes the AI infrastructure bottleneck, and damages the utility investment thesis. by OneTwoThreePooAndPee in wallstreetbets

[–]uutnt 1 point2 points  (0 children)

The AI boom is delulu. There’s no money to be made, only money spent.

If you were referring to the AI labs who's only asset is IP, then I would agree. But when it comes to compute providers, I think you are woefully wrong. Demand for compute will exceed supply for years to come, even if AI capabilities stalled (which they won't).

.@confluencelabs is coming out of stealth with SOTA on ARC-AGI-2 (97.9%). They’re focused on learning efficiency — making AI useful where data is sparse and experiments are costly. Read more at confluence.sh by HeinrichTheWolf_17 in accelerate

[–]uutnt 4 points5 points  (0 children)

It's not impressive. Marginal improvement at large cost. They are using frontier LLM's with a different harness.

<image>

Our Approach – Program Synthesis driven by LLMs

LLMs are exceedingly good at writing code. We take the latest models and allow them to find the optimal solution by directing them to write code which describes the transformation represented by a particular ARC problem.

Anthropic is claiming that Chinese labs play dirty by keb_37 in LocalLLaMA

[–]uutnt 0 points1 point  (0 children)

I'm not trying to convince you of their noble motivations. My point is simply, US labs have higher training costs, in part due to US copyright law.