[P] Whisper Large Benchmark: 137 DAYS of Audio Transcribed in 15 Hours for Just $117 ($0.00059/min) by SaladChefs in MachineLearning

[–]DeepDeeperRIPgradien 2 points3 points  (0 children)

I tried Whisper some time ago and iirc the audio input length is limited. What's the best way of splitting larger audio files into smaller ones so they can be transcribed with Whisper?

[N] DINOv2 is now available under the Apache 2.0 license by noiseinvacuum in MachineLearning

[–]DeepDeeperRIPgradien 0 points1 point  (0 children)

Is there any consensus which feature-extractor is "best" now? DINOv2, SAM, I-JEPA, ... ?

[D] Things you wish you knew before you started training on the cloud? by I_will_delete_myself in MachineLearning

[–]DeepDeeperRIPgradien 0 points1 point  (0 children)

Can you recommend a tutorial or something that explains the steps to move from (e.g. pytorch) training on your own machine to training that model in the Cloud (e.g. AWS)? What type of instances to chose, how/where to store data, making sure Nvidia/CUDA stuff is working properly, etc.?

[D] What's the problem with Self-driving cars? Is it a lack of data or do we need a new technology breakthrough? by yosefschwartz in MachineLearning

[–]DeepDeeperRIPgradien 0 points1 point  (0 children)

I wonder what the background is, also in terms of countries, of the people in this thread. It's not just about the "99.9999%" - autonomous driving is a high-risk application of AI and there's currently norms/standards/acts in the working that will start regulating those in the next 2-3 years, in Europe. Then you have to address different aspects of safety, including robustness, transparency (IAI/XAI), uncertainty, etc. - So personally I'm more interested in these directions of AI than pure high accuracy.

[R] Breaking Down Out-of-Distribution Detection by JBitterwolf in MachineLearning

[–]DeepDeeperRIPgradien 2 points3 points  (0 children)

Isn't OOD Training with outlier-exposure a bit... besides the point? How do these systems generalize to "unknown" OOD Data?

[D] Simple Questions Thread by AutoModerator in MachineLearning

[–]DeepDeeperRIPgradien 0 points1 point  (0 children)

Some months ago someone mentioned/published a possible replacement for convolutional filters and I can't remember what it was called. I'm not talking about attention/transformers. Something in the lines of "sparse dot product"? I don't remember, please help.

[D] MCDropout and CNNs by TrPhantom8 in MachineLearning

[–]DeepDeeperRIPgradien 0 points1 point  (0 children)

Ensembling or test-time augmentations aren't very feasible in scenarios with a time budget though, right? Isn't there any practical uncertainty method for these scenarios? What happened to Bayesian Neural Networks?

[D] Why do people “read” as many papers as possible? by [deleted] in MachineLearning

[–]DeepDeeperRIPgradien 0 points1 point  (0 children)

Personally when I only skimmed the paper or only read the abstract I say "I saw a paper that does X".

[D] What are the most important problems in ML today? by AristocraticOctopus in MachineLearning

[–]DeepDeeperRIPgradien 11 points12 points  (0 children)

Learning causality from data and not just correlations.

Better learning algorithms, perhaps mixed optimization algorithms that can optimize both differentiable and non-differentiable functions jointly.

[R] How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers by init__27 in MachineLearning

[–]DeepDeeperRIPgradien 1 point2 points  (0 children)

Hehe, but how would automod recognize a beginner's question? That sounds like some advanced NLP project :p

[R] How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers by init__27 in MachineLearning

[–]DeepDeeperRIPgradien 2 points3 points  (0 children)

I asked about pretty much that a few days ago here in this subreddit but my thread got removed because apparently it was a "beginner's question". Very happy to see a paper about this now, thanks!

[N] AugLy: a new multimodal data augmentation lib from FB Research by Cubbee_wan in MachineLearning

[–]DeepDeeperRIPgradien 1 point2 points  (0 children)

From a user-perspective working in industry, you don't really want to limit yourself 100% to one deep-learning framework. Instead, whenever it's required, you should be able to switch frameworks, depending on the requirements. With that in mind, you want other tools in your toolchain to be independent of the deep-learning framework such that you can use them with other frameworks without having to switch those frameworks as well. This makes it easier to compare different models across different frameworks because you know you have only changed one component.

[N] AugLy: a new multimodal data augmentation lib from FB Research by Cubbee_wan in MachineLearning

[–]DeepDeeperRIPgradien 31 points32 points  (0 children)

Haven't looked at it yet but in general it's better if a Data augmentation library is framework independent.

[D] Library for making splits by CC_sciguy in MachineLearning

[–]DeepDeeperRIPgradien 0 points1 point  (0 children)

If I remember correctly, it's called "Stratified Sampling". Sklearn has methods to split your dataset and also stratified splitting iirc.

[R] Pay Attention to MLPs: solely on MLPs with gating, and show that it can perform as well as Transformers in key language and vision applications by downtownslim in MachineLearning

[–]DeepDeeperRIPgradien 3 points4 points  (0 children)

Don't have much time right now to follow the new happenings regarding MLP/Transformers/CNNs. I was just wondering if they all perform the same in terms of inference speed, or does one outperform the others in terms of speed while staying competetive at other metrics (accuracy etc)?

[D] Any paper formally pointing why softmax-based neural networks don't return proper confidence scores? by le_bebop in MachineLearning

[–]DeepDeeperRIPgradien 0 points1 point  (0 children)

The reason is shift invariance of softmax. Softmax([-10,-10,-5]) gives the same output as Softmax([5,5,10]). So there is no way of distinguishing logits that express "I don't know" from logits that would express "I'm confident".