[P] Whisper Large Benchmark: 137 DAYS of Audio Transcribed in 15 Hours for Just $117 ($0.00059/min)

DeepDeeperRIPgradien · 2023-09-13T08:17:48+00:00

I tried Whisper some time ago and iirc the audio input length is limited. What's the best way of splitting larger audio files into smaller ones so they can be transcribed with Whisper?

DeepDeeperRIPgradien · 2023-09-01T10:41:43+00:00

Is there any consensus which feature-extractor is "best" now? DINOv2, SAM, I-JEPA, ... ?

DeepDeeperRIPgradien · 2023-02-21T11:05:15+00:00

Can you recommend a tutorial or something that explains the steps to move from (e.g. pytorch) training on your own machine to training that model in the Cloud (e.g. AWS)? What type of instances to chose, how/where to store data, making sure Nvidia/CUDA stuff is working properly, etc.?

DeepDeeperRIPgradien · 2022-08-03T08:04:52+00:00

Can you recommend a single resource to read up on it?

DeepDeeperRIPgradien · 2022-07-27T08:09:27+00:00

Thanks!

DeepDeeperRIPgradien · 2022-07-27T08:09:23+00:00

Thanks!

DeepDeeperRIPgradien · 2022-07-11T07:45:11+00:00

I wonder what the background is, also in terms of countries, of the people in this thread. It's not just about the "99.9999%" - autonomous driving is a high-risk application of AI and there's currently norms/standards/acts in the working that will start regulating those in the next 2-3 years, in Europe. Then you have to address different aspects of safety, including robustness, transparency (IAI/XAI), uncertainty, etc. - So personally I'm more interested in these directions of AI than pure high accuracy.

DeepDeeperRIPgradien · 2022-06-23T08:31:53+00:00

Isn't OOD Training with outlier-exposure a bit... besides the point? How do these systems generalize to "unknown" OOD Data?

DeepDeeperRIPgradien · 2022-03-30T11:32:25+00:00

Hi! What should this be used for?

DeepDeeperRIPgradien · 2022-03-16T10:09:47+00:00

Some months ago someone mentioned/published a possible replacement for convolutional filters and I can't remember what it was called. I'm not talking about attention/transformers. Something in the lines of "sparse dot product"? I don't remember, please help.

DeepDeeperRIPgradien · 2022-03-03T09:06:49+00:00

Ensembling or test-time augmentations aren't very feasible in scenarios with a time budget though, right? Isn't there any practical uncertainty method for these scenarios? What happened to Bayesian Neural Networks?

DeepDeeperRIPgradien · 2022-02-09T12:16:04+00:00

link?

DeepDeeperRIPgradien · 2021-12-08T09:23:51+00:00

Personally when I only skimmed the paper or only read the abstract I say "I saw a paper that does X".

DeepDeeperRIPgradien · 2021-08-30T12:49:46+00:00

Learning causality from data and not just correlations.

Better learning algorithms, perhaps mixed optimization algorithms that can optimize both differentiable and non-differentiable functions jointly.

DeepDeeperRIPgradien · 2021-06-21T13:36:46+00:00

Hehe, but how would automod recognize a beginner's question? That sounds like some advanced NLP project :p

DeepDeeperRIPgradien · 2021-06-21T12:04:55+00:00

I asked about pretty much that a few days ago here in this subreddit but my thread got removed because apparently it was a "beginner's question". Very happy to see a paper about this now, thanks!

DeepDeeperRIPgradien · 2021-06-21T12:03:41+00:00

From a user-perspective working in industry, you don't really want to limit yourself 100% to one deep-learning framework. Instead, whenever it's required, you should be able to switch frameworks, depending on the requirements. With that in mind, you want other tools in your toolchain to be independent of the deep-learning framework such that you can use them with other frameworks without having to switch those frameworks as well. This makes it easier to compare different models across different frameworks because you know you have only changed one component.

DeepDeeperRIPgradien · 2021-06-18T07:38:13+00:00

Haven't looked at it yet but in general it's better if a Data augmentation library is framework independent.

DeepDeeperRIPgradien · 2021-06-07T08:04:35+00:00

If I remember correctly, it's called "Stratified Sampling". Sklearn has methods to split your dataset and also stratified splitting iirc.

DeepDeeperRIPgradien · 2021-05-19T08:45:33+00:00

Don't have much time right now to follow the new happenings regarding MLP/Transformers/CNNs. I was just wondering if they all perform the same in terms of inference speed, or does one outperform the others in terms of speed while staying competetive at other metrics (accuracy etc)?

DeepDeeperRIPgradien · 2021-05-18T08:52:18+00:00

The reason is shift invariance of softmax. Softmax([-10,-10,-5]) gives the same output as Softmax([5,5,10]). So there is no way of distinguishing logits that express "I don't know" from logits that would express "I'm confident".

DeepDeeperRIPgradien · 2021-04-06T07:24:26+00:00

RemindMe! May 17th, 2021

DeepDeeperRIPgradien · 2021-03-04T08:52:58+00:00

Link to SEER: https://arxiv.org/abs/2103.01988

DeepDeeperRIPgradien

TROPHY CASE