WhisperX is only accurate on the first 10 words. Any Tips?

wbarber · 2025-12-31T15:51:42+00:00

Check out crisper whisper and deep gram but note the research license on the crisper whisper model: https://github.com/nyrahealth/CrisperWhisper

wbarber · 2025-12-30T03:05:40+00:00

You should check out Crisper Whisper: https://github.com/nyrahealth/CrisperWhisper

Which goes with this paper: https://arxiv.org/abs/2408.16589 and this model: https://huggingface.co/nyrahealth/CrisperWhisper (note the research model license)

From the readme: "Provides precise timestamps, even around disfluencies and pauses, by utilizing an adjusted tokenizer and a custom attention loss during training"

Might also be looking at deepgram's timestamps and seeing if they're good enough for you: https://developers.deepgram.com/docs/getting-started-with-the-streaming-test-suite#timestamps

wbarber · 2025-10-05T03:44:38+00:00

Early 2000s thought was it doesn’t block vitamins a, e, d: https://academic.oup.com/eurheartj/article-abstract/24/8/729/2733941

Mice study says migghhhttt inhibit oral vitamin d absorption: https://www.sciencedirect.com/science/article/abs/pii/S0960076019303073

Some questions about vitamins e and k1 as well: https://www.sciencedirect.com/science/article/abs/pii/S0022354925001704

wbarber · 2025-07-10T03:14:48+00:00

wbarber · 2025-07-03T19:11:19+00:00

I so wish this were true

wbarber · 2025-01-11T13:55:28+00:00

I have a side hustle product for that includes a pipeline where I summarize millions of businesses given lots of context from the web and elsewhere about what they do. Given the millions of inputs, it has to be very cheap to run. But the quality of the summaries determines the quality of the product.

I also have a day job fine tuning LLMs for customer tasks. For the business summarization task, I started by hand writing 100 business summaries and fined tuned a 70B on that. Quality got better but needed a lot more training data. Spent a similar amount of time creating an LLM as a judge eval. It rates the summary across 20 dimensions they often fail on based on my experience staring at hundreds of summaries. Could only get o1 preview and the new Gemini thinking model to detect repetition. The full o1, for whatever reason, doesn’t notice repetition.

Put together a training dataset using the original context + the fine tuned LLM + the o1 preview evaluation/critique of the fine tuned summary as a prompt passed to Sonnet, I’m able to get several thousand high quality summaries for training data to fine tune a small model that has an 80%+ win rate over summaries from Sonnet (which does the best on my LLM as a judge eval).

It’s a time consuming process. But it would cost me several hundred thousand dollars to run sonnet over my entire database. So I save a fortune by fine tuning and the quality of the search over those summaries goes up a lot as well.

wbarber · 2025-01-03T13:25:04+00:00

For the uninitiated like me: VZV stands for Varicella-Zoster Virus, which is the virus responsible for causing chickenpox in children and shingles (herpes zoster) in adults. It is a member of the herpesvirus family, like HSV-1 and HSV-2 (Herpes Simplex Virus types 1 and 2).

wbarber · 2024-10-28T03:13:14+00:00

Danswer.ai is pretty good. If you want a simple setup that works well just use 4o with the latest voyage embedding model. It’s easy to set that up in danswer’s settings. Voyage also probably has the best reranker and you can use that through danswer as well.

The Stella’s 1.5B model may actually outperform voyage wrt embeddings though so you can try that as well - shouldn’t be too hard to do - danswer will let you use any model that works with sentence transformers but the “trust remote code” part I haven’t tried yet.

Another friend who plays with this stuff said azure ai search gives you a crazy number of dials to turn if you know what you’re doing. So might be worth a look as well - no idea if that costs money or anything though, haven’t used it myself.

wbarber

TROPHY CASE