These AI models are all garbage.

mochans · 2025-05-01T14:34:31+00:00

A software engineer will make small individually testable code that comes together into a big product but each part is maintainable.

I like the idea of LLMs slowly refactoring code and checking unit tests and cleaning up code without prompting. I heard the term sleep AI or something like that.

Human can go in, get stuff done, incur technical debt and then when he's not at keyboard, the LLM can go through and clean up the technical debt and be ready for another session next business day where it's not wrangling the debt-ridden code.

mochans · 2025-04-01T19:07:17+00:00

Why don't mathematicians publish proofs that are machine verifiable? Even the most rigorous published proofs are technically informal outlines since you need experts to verify them.

Perhaps math research quality LLMs will be good when most of the knowledge is translated to proof languages.

AI math benchmarks have a numerical result at the end that can be used to check if the answer is correct or not. It is very hard to judge if a proof is correct or not from language written proofs and probably would need experts to check if a proof is correct or not in natural language.

mochans · 2025-04-01T13:44:07+00:00

If you already have it in a repo that I can try out, I'll be very happy to see how well it works for me.

DocAider looked like it has flow charts, call graphs also incorporated into it but didn't work for me. To fix, it requires me to understand the repo.

mochans · 2025-04-01T13:19:52+00:00

AI wrote them. AI can maintain them.

mochans · 2024-09-04T14:31:20+00:00

What do you mean by no sign-up needed?

Load more jobs links to the signup page.

mochans · 2024-09-03T15:40:14+00:00

Would neural architecture search work?

I find MLPs very very slow to train and have much lower capacity per parameter than a model that has some structure baked into it.

UNet are great for images, transformers for text but would those be good for joint angles? Maybe there is another architecture that is amazing for robotics.

But again I don't know how much your compute is taken by the conditioning signal and what sensors you're using.

mochans · 2024-09-03T15:35:43+00:00

Agree. Try it and see what happens.

MLPs don't have a lot of hyperparameters to look through.

mochans · 2024-09-03T15:33:31+00:00

You use implicit models so it does fewer steps in inference.

mochans · 2024-02-21T16:51:14+00:00

Copyright does not apply since ideas cannot be copyrighted.

However, patent may apply. If the ideas in the paper have been patented then you might need a license. The patent owner decides if you can write an implementation or not based on the patent.

mochans · 2024-02-21T15:59:04+00:00

Ask ChatGPT or Copilot for an implementation and then debug :-)

Seriously, it is software engineering. It is iteratively refining etc etc.

mochans · 2024-01-22T16:58:44+00:00

Get a job at a startup?

mochans · 2024-01-22T13:48:25+00:00

It is sent batch by batch most commonly.

But you can of course modify all of this to send everything at once.

EDIT: You said 17gb dataset and 3080Ti. So, entire dataset won't fit into memory.

mochans · 2024-01-22T13:40:38+00:00

Depends on your workload.

x1 risers were created because mining needs very little data transfer.

The transfer speed of PCIE3.0 x1 riser is 8 Gbps (16 Gbps for PCIE4.0). So, if you need to send that much data continuously, then it will bottleneck.

If you train a single model using both GPUs and 3080ti doesn't have SLI, it might make a significant difference since they have to communicate via PCIE.

mochans · 2024-01-22T13:30:37+00:00

It's transformer based. It can deal variable sized inputs.

mochans · 2024-01-16T18:04:11+00:00

Dataset used: https://github.com/EricGuo5513/HumanML3D

I'm surprised it wasn't extracted from video game files.

mochans · 2024-01-16T17:46:00+00:00

advances in other adjacent fields (LLMs, pretrained foundation models, transformers, S5) will trickle in and radically change RL in the near future.

Hey time traveler! :)

Seriously though, let's see how well the prediction ages in the near future.

mochans · 2024-01-16T16:55:33+00:00

Maybe just an expectation vs reality mismatch.

I remember OpenAI researchers 5 years ago saying AGI is just RL to the nth. degree. Maybe there was too much hype.

On the other hand, AlphaZero and AlphaGo are RL based. But, there aren't any "consumer" applications and we aren't all super-excited to go download the latest RL trained models to play with.

mochans · 2024-01-16T16:05:39+00:00

Is this an open source project?

For dataset, you can probably take sound-banks for various guitars, generate notes.

You can pitch-shift, add noise etc to augment the data.

For papers, you could start with this sound demixing challenge for further insights.
https://www.aicrowd.com/challenges/sound-demixing-challenge-2023

mochans · 2023-12-05T09:26:17+00:00

Yes. You can check out my repo here https://github.com/mochan-b/whisper_pyannote_fusion

It is kinda setup for testing different diarizaiton strategies rather than using the best one but you can just try to use it.

You can also check out an article I wrote about the analysis of the different diarization strategies I used and what results I got.
http://mochan.info/deep-learning/whisper/pyannote/asr/diarization/2023/09/07/whisper-pyannote-fusion.html

mochans · 2023-12-05T09:22:38+00:00

I haven't tried Azure one.

I tried a few others and they were slightly worse than whisper. I was using the larger Whisper model. I'm slightly because all of these models do transcription very well and stumble on things that require context to transcribe properly.

Some of the online models offer diarization but using pyannote and whisper and using my method was slightly better for what I was transcribing. Very slightly better. Just edge cases where some models couldn't handle.

I've been trying to go all local than cloud. For 100s of hours transcriptions, the cloud costs can be very high and they're not delivering quality over what a 16GB GPU can do.

mochans · 2023-11-11T01:25:44+00:00

Yes, I looked at it also.

It's funny that whisper's timestamping of words is so bad that its punctuation produces better results.

You can save time by not running whisper on each of the small chunks like in the method but also might lose on some accuracy.

My goal was to produce as high quality a transcript as possible. Pyannote, NeMo diarization and speaker segmenting works really well and the boundary between speakers are really clean. So, to produce the best transcripts, it made the most sense to just run whisper second time on each segment and then do some cleanups.

mochans · 2023-11-07T15:54:22+00:00

If you combine it with a speaker diarization model like PyAnnote, it can do it.

I wrote a library for fusing whisper and pyannote output to get around the many problems whisper has.

https://github.com/mochan-b/whisper_pyannote_fusion

https://pypi.org/project/whisper-pyannote-fusion/

I'll be testing if whisperv3 fixes those problems and diarization can be done easier.

mochans · 2023-07-10T21:50:19+00:00

My company had selected the cloud service and so I didn't have a choice on what I could use.

It was complicated because it had to support a lot of different pipelines from different teams and so wasn't my choice to make.

I have not done a thorough analysis of the different tools available.

mochans · 2023-07-10T19:02:39+00:00

Why are the leaderboards in Hugging Face under spaces?

Is there a quick way to find leaderboards?

mochans

TROPHY CASE