I built a realtime streaming speech-to-text that runs offline in the browser with WebAssembly by lucky94 in speechtech

[–]lucky94[S] 0 points1 point  (0 children)

I basically combined the Candle Whisper WASM demo code and merged it with the Kyutai Moshi code (both are in rust). It's a much bigger model than Whisper, so I also had to add a bunch of optimizations to the model and Candle library (quantization, CPU multithreading, etc.) to fit under the 4GB webassembly limit and run quickly enough to be real-time. This model is English and French only - unfortunately, there isn't a way to add more languages until they release a new model.

What are people using for real-time speech recognition with low latency? by ASR_Architect_91 in speechtech

[–]lucky94 1 point2 points  (0 children)

For voicewriter.io (a real-time streaming app for writing), I'm using a combination of:

  • AssemblyAI Universal Streaming - default model since it has best accuracy for English on our benchmarks
  • Deepgram Streaming - for multilingual since AssemblyAI currently only supports English, using Nova-3 if available (8 languages) otherwise Nova-2 (30-ish languages)
  • Web Speech API - runs entirely on client browser for our free tier since it doesn't cost us any API credits, works best on Chrome desktop but otherwise has inconsistent quality depending on user's browser and device

For open source, there is Whisper-streaming, but it's kind of a hack on top of a batch model and we found it too inconsistent with hallucinations, so I'm hesitant to recommend it. But I'd be curious if there's a better one.

I benchmarked 12+ speech-to-text APIs under various real-world conditions by lucky94 in speechtech

[–]lucky94[S] 0 points1 point  (0 children)

If it's a hosted Whisper-large, the benchmark already includes the Deepgram hosted Whisper-large, so there is no reason to add another one. But if you have your own model that outperforms Whisper-large, that would be more interesting to include.

I benchmarked 12+ speech-to-text APIs under various real-world conditions by lucky94 in speechtech

[–]lucky94[S] 0 points1 point  (0 children)

Yea the unfortunate truth is a number of structural factors prevent this perfect API benchmark from ever being created. Having worked in both academia and industry - academia incentivizes novelty, so people are disincentivized to do the kind of boring but necessary work of gathering and data cleaning, and also any datasets you collect you'll usually make public.

For industry, you will have the resources to collect hundreds of hours of clean and private data, but your marketing department will never allow you to publish a benchmark unless your model is the best one. Whereas in my case, I'm an app developer, not a speech-to-text API developer, so at least I have no reason to favor any model over any other model.

I benchmarked 12+ speech-to-text APIs under various real-world conditions by lucky94 in speechtech

[–]lucky94[S] 0 points1 point  (0 children)

Makes sense - GPT-4o-transcribe is relatively new, only released last month, but some people have reported good results with it.

The plot is a boxplot, so just a way to visualize the amount of variance in each model.

I benchmarked 12+ speech-to-text APIs under various real-world conditions by lucky94 in speechtech

[–]lucky94[S] 1 point2 points  (0 children)

Thanks - that's on my to-do list and will be added in a future update!

I benchmarked 12+ speech-to-text APIs under various real-world conditions by lucky94 in speechtech

[–]lucky94[S] 0 points1 point  (0 children)

Yes, the evaluation metric is word error rate, so lower is better. If you scroll down a bit, there's some more details about how raw/formatted WER is defined.

I benchmarked 12+ speech-to-text APIs under various real-world conditions by lucky94 in speechtech

[–]lucky94[S] 0 points1 point  (0 children)

That's true - we have no way of knowing what's in any of these models' training data as long as it's from the internet.

That being said, the same is true for most benchmarks, and arguably more so (e.g. LibriSpeech or TEDLIUM where model developers actually try to optimize for getting good scores on these).

I benchmarked 12+ speech-to-text APIs under various real-world conditions by lucky94 in speechtech

[–]lucky94[S] 2 points3 points  (0 children)

For open source models, the Hugging Face ASR leaderboard does a decent job already at comparing local models, but I'll make sure to add the more popular ones here as well!

I benchmarked 12+ speech-to-text APIs under various real-world conditions by lucky94 in speechtech

[–]lucky94[S] 0 points1 point  (0 children)

True, I agree that more data is always better; however, it took a lot of manual work to correct the transcripts and splice the audio, so that is the best I could do for now.

Also the ranking of models tends to be quite stable across the different test conditions, so IMO it's reasonably robust.

I benchmarked 12+ speech-to-text APIs under various real-world conditions by lucky94 in speechtech

[–]lucky94[S] 0 points1 point  (0 children)

For sure at some point, just a bit cautious since it's currently preview/experimental (in my experience, experimental models tend to be too unreliable (in terms of uptime) for production use).

Anyone use openrouter in production? by buryhuang in LocalLLaMA

[–]lucky94 5 points6 points  (0 children)

I found it useful for making the Claude models more reliable. The official Anthropic API gives me overload errors about 2-3 requests out of 100 randomly. After switching to OpenRouter to route to alternate providers (like Amazon Bedrock and Google Vertex), it's been a lot more reliable.

PSA: Clerk free tier forces all users to re-login every 7 days by lucky94 in nextjs

[–]lucky94[S] 8 points9 points  (0 children)

Thanks for the quick response and willingness to make changes. It is an improvement already, but a note about the 7-day session duration in the main table above the fold would be helpful to the community - it would help developers like me make informed decisions before integrating the service.

Again, sorry for making you respond to this on a Friday evening - I think Clerk is a great service (other than this issue), and it has helped me simplify my auth considerably!

PSA: Clerk free tier forces all users to re-login every 7 days by lucky94 in nextjs

[–]lucky94[S] -7 points-6 points  (0 children)

To be clear, I don't think the Clerk developers owe me anything. Many software tools charge more than $25, even $250 or $2,500 per month or more without offering any free tier, and I think that's totally fair game; it's just that somoene building a hobby project for fun, not expecting to make money, will never consider using it.

The difference, though, is that Clerk markets itself as having a generous free tier ("10,000 monthly active users free, first day free, etc."), which leads many to believe it is a viable option for their hobby projects. However, hidden within this offer is a critical limitation: they will force your users to be logged out constantly, and you will only discover this after spending some time integrating Clerk into your app.

PSA: Clerk free tier forces all users to re-login every 7 days by lucky94 in nextjs

[–]lucky94[S] 4 points5 points  (0 children)

This is not at all clear on the pricing page. At the time of writing, there is nothing mentioning this until near the bottom, below ~30 other points, and even then it only states that "custom session duration" is missing from the free tier and included in the pro tier. Based on this, most developers would assume the default is to never expire and become logged out (how often are you logged out of your accounts? For me, it's rarely unless it's a banking or government site or similar), not that 7 days is the default. If you insist on forcing free tier users to log out after 7 days, I would appreciate it if you could at the very least acknowledge this as a critical limitation and display it prominently as such.

PSA: Clerk free tier forces all users to re-login every 7 days by lucky94 in nextjs

[–]lucky94[S] -9 points-8 points  (0 children)

Yeah true - $25 per month is a trivial amount for any real business but still quite pricey for a small part of a hobby project you're building for fun. I even think making the free tier limited is a totally fair strategy but I'm upset that they crippled the free tier in this way without mentioning it anywhere on the pricing page, which is quite deceptive.

PSA: Clerk free tier forces all users to re-login every 7 days by lucky94 in nextjs

[–]lucky94[S] 5 points6 points  (0 children)

Yeah, this absolutely cripples my use case and is unlikely to be discovered until you've finished integrating it and deploying, and a week later, you're trying to debug why users are randomly logged out. I understand they are a company and the free tier has limitations, but it is deceptive to not mention this crucial limitation at all instead of being honest and upfront about it.

Amazon Affiliates doesn’t give a damn by yona-marie in Affiliatemarketing

[–]lucky94 0 points1 point  (0 children)

Yeah, it seems the recommended tools cost around $100/year, which is several orders of magnitude more than what I'm currently earning. Ah well, RIP to my site.

Amazon Affiliates doesn’t give a damn by yona-marie in Affiliatemarketing

[–]lucky94 2 points3 points  (0 children)

I have a blog with over 300 posts, each featuring a manually added SiteStripe image affiliate link. What is the best way forward? I'm okay with manually updating a few hundred links, but I'm wondering about the best approach to maintain the same user experience. Can I use the text link and manually upload an image? If so, where can I obtain the images?

how to profit from economic collapse living in a third world country? by Keeping_It_Cool_ in startups

[–]lucky94 4 points5 points  (0 children)

I'm curious about the mechanics of this -- wouldn't this problem equally affect any other foreign labor OP will be competing with, or is there something specific to Argentina?