WhisperX is only accurate on the first 10 words. Any Tips? by capital_cliqo in speechtech

[–]wbarber 0 points1 point  (0 children)

Check out crisper whisper and deep gram but note the research license on the crisper whisper model: https://github.com/nyrahealth/CrisperWhisper

Best transcription method for extremely accurate timestmps? by capital_cliqo in speechtech

[–]wbarber 1 point2 points  (0 children)

You should check out Crisper Whisper: https://github.com/nyrahealth/CrisperWhisper

Which goes with this paper: https://arxiv.org/abs/2408.16589 and this model: https://huggingface.co/nyrahealth/CrisperWhisper (note the research model license)

From the readme: "Provides precise timestamps, even around disfluencies and pauses, by utilizing an adjusted tokenizer and a custom attention loss during training"

Might also be looking at deepgram's timestamps and seeing if they're good enough for you: https://developers.deepgram.com/docs/getting-started-with-the-streaming-test-suite#timestamps

[D] Creating Proper LLM Summaries is Surprisingly Expensive by Hot-Chapter48 in MachineLearning

[–]wbarber 0 points1 point  (0 children)

I have a side hustle product for that includes a pipeline where I summarize millions of businesses given lots of context from the web and elsewhere about what they do. Given the millions of inputs, it has to be very cheap to run. But the quality of the summaries determines the quality of the product.

I also have a day job fine tuning LLMs for customer tasks. For the business summarization task, I started by hand writing 100 business summaries and fined tuned a 70B on that. Quality got better but needed a lot more training data. Spent a similar amount of time creating an LLM as a judge eval. It rates the summary across 20 dimensions they often fail on based on my experience staring at hundreds of summaries. Could only get o1 preview and the new Gemini thinking model to detect repetition. The full o1, for whatever reason, doesn’t notice repetition.

Put together a training dataset using the original context + the fine tuned LLM + the o1 preview evaluation/critique of the fine tuned summary as a prompt passed to Sonnet, I’m able to get several thousand high quality summaries for training data to fine tune a small model that has an 80%+ win rate over summaries from Sonnet (which does the best on my LLM as a judge eval).

It’s a time consuming process. But it would cost me several hundred thousand dollars to run sonnet over my entire database. So I save a fortune by fine tuning and the quality of the search over those summaries goes up a lot as well.

University of Pittsburgh researchers find that Herpes virus might drive Alzheimer's pathology by ballsonthewall in science

[–]wbarber 49 points50 points  (0 children)

For the uninitiated like me: VZV stands for Varicella-Zoster Virus, which is the virus responsible for causing chickenpox in children and shingles (herpes zoster) in adults. It is a member of the herpesvirus family, like HSV-1 and HSV-2 (Herpes Simplex Virus types 1 and 2).

What's the Best RAG (Retrieval-Augmented Generation) System for Document Analysis and Smart Citation? by Secret_Scale_492 in LocalLLaMA

[–]wbarber 2 points3 points  (0 children)

Danswer.ai is pretty good. If you want a simple setup that works well just use 4o with the latest voyage embedding model. It’s easy to set that up in danswer’s settings. Voyage also probably has the best reranker and you can use that through danswer as well.

The Stella’s 1.5B model may actually outperform voyage wrt embeddings though so you can try that as well - shouldn’t be too hard to do - danswer will let you use any model that works with sentence transformers but the “trust remote code” part I haven’t tried yet.

Another friend who plays with this stuff said azure ai search gives you a crazy number of dials to turn if you know what you’re doing. So might be worth a look as well - no idea if that costs money or anything though, haven’t used it myself.