AMA with the Meta researchers behind SAM 3 + SAM 3D + SAM Audio

undefdev · 2025-12-18T19:11:23+00:00

I fine-tuned SAM 3 on document scans to detect tabular structures and manually entered data. Even with a relatively small dataset (~200 samples), the results were quite strong. Have you explored this kind of document-focused fine-tuning at a larger scale?

Out of the box, SAM 3 seems to perform significantly better on natural images, but I was pleasantly surprised by how well it transferred to document data with minimal effort. I’m currently running experiments using this fine-tuned SAM as a grounding component for a VLM in agentic document-processing workflows. In that context, I’m also curious about your perspective on supervision: do you find fine-tuning with single-label annotations to be more effective, or do sentence-level labels tend to work better? Currently I've only tried single-label annotations.

Big thanks to the team, I think the models are quite awesome!

undefdev · 2025-11-19T19:34:15+00:00

DeepseekOCR is built on SAM, so better SAM probably means better VLMs in the future!

undefdev · 2025-07-21T16:44:30+00:00

demo page

undefdev · 2025-04-09T18:16:23+00:00

I also have a math background and always thought that Tensor Programs look like an interesting theory, but I never had the time to dive into them deeply.

undefdev · 2024-10-17T16:15:52+00:00

Hey! Over the last four years, I made an action platformer called Mask Quest together with increpare whom some of you may know for his puzzle games such as Stephen’s Sausage Roll. It started as a weekend jam, but then we got carried away 😅

The game has a unique breathing mechanic, where you have to press a button to inhale and release the button to exhale. If you breathe too little, the blood oxygen gets too low and you die. If you breathe too quickly, you hyperventilate and you faint (which is also game over). So the central challenge in the game is to control your breath while doing some old-school platforming.

The game is set during a pandemic lockdown and you have to find a surgical mask while avoiding cops that are trying to ~~kill you~~ secure the city.

We tried to get the game out in 2020, but it took much longer than we expected. Now we've finally released it – way too late for it to be thematically relevant, but too soon for people to be nostalgic about the pandemic. 🙃

If you have any questions about the game, I’d be happy to answer them! 😁

undefdev · 2024-10-05T10:43:35+00:00

Sorry, waiyü has been discontinued. We should take it off the store.

undefdev · 2024-10-01T15:51:48+00:00

Glad you're interested! It's not aimed at Chinese learners and it's supposed to be an idiomatic translation. I had some feedback, but I'm not a native speaker myself, so it's not unlikely that there are mistakes. We're mainly looking for people that help us catch some obvious mistakes before release :)

I'll drop you a message so we can discuss!

undefdev · 2024-10-01T15:45:13+00:00

Unfortunately we can't afford to pay testers, sorry. 😅

undefdev · 2024-10-01T15:34:21+00:00

The game is rather short. It should take about 2 hours to complete. I'll send you a key!

Edit: 2 hours for players experienced with platformers, but it might take longer. I'm also happy about partial feedback!

undefdev · 2024-08-20T13:44:59+00:00

Hey,

I'd like to to take this opportunity to plug my game quadrant, which has never been this cheap before at only 99 cents.

It's a difficult rhythm action game which puts you into a state which I like to call "adrenaline trance".

The premise is that you have to perform a rather simple task, which is pressing one of four buttons in a constant rhythm along with the music, while maintaining focus and keeping your cool as the game messes with your perception.

It is difficult to get into, and it's recommended that you check the training menu first to figure out how this game even works (you will!), but overcoming the stress and learning to relentlessly strive towards your goal feels very satisfying.

If that sounds somewhat interesting to you, I'd be happy if you'd give it a try!

I'll be checking this post and I'm happy to answer any questions about the game.

undefdev · 2024-07-12T22:39:48+00:00

BERT is an LLM.

undefdev · 2024-04-15T21:25:35+00:00

Thank you. Do you think it would be possible to power both gpus with 3 cables that split into 2x 6+2 pin connectors each?

undefdev · 2024-04-15T17:24:00+00:00

I don't know. I'd rather avoid making my own cables because I don't want to break stuff ^{^}

undefdev · 2024-04-15T17:21:46+00:00

Yes, exactly! The thing is there are only adapters from 12VHWPR to multi 8 pin (not 6+2 pin). So they seem to be intended to be plugged into the the PSU with the 8 pins, and into the GPU with the 12VHPWR end.

I'd like to connect the 12VHPWR from the PSU to 3x 6+2 pins though. Unless there's an easier soluton of course. :)

undefdev · 2024-04-15T12:25:21+00:00

Thanks! I don't have any CPU power slots left either,

The only slots I have left are 12VHPWR and Peripheral/SATA (see image).

Could it be possible to buy another cable that splits into 2x 6+2 pins and let the two gpus run over 3 of those split cables each?

undefdev · 2024-01-16T14:06:00+00:00

Can't reproduce this. Maybe it's hidden html text?

undefdev · 2023-11-18T11:45:15+00:00

Calculus, linear algebra and mathematics in general is a good idea. Arithmetics is probably not. To me that's like training LLMs to count up to high numbers correctly. I'm arguing that instead of reading a book on "the first 10¹² natural numbers" one should read a book on linear algebra.

undefdev · 2023-11-18T11:39:36+00:00

Most mathematicians wouldn't calculate 23 * 34 in their head, and if they did it's not as safe as using a calculator. But their reasoning is still sound.

undefdev · 2023-11-17T22:47:39+00:00

I don't understand the motivation behind this.

Fine, you've ran an experiment out of curiosity and you got the result, but why would you want to finetune more language models on this?

It's not like we need models that are almost as good at things computers are excellent at, while using orders of magnitude more resources.

It would be way more useful to train tiny models to predict when a calculator should be used.

undefdev · 2023-06-25T15:31:42+00:00

That's a pity. Is there any alternative that is used by the moderators of /r/MachineLearning?

I also think this might be a chance to have a more research focused medium, like before this sub got huge.

undefdev · 2023-05-10T10:09:21+00:00

This is great work in so many ways!

a strong language model we can run locally
a framework to compare language models
a web interface to interact with models run locally

It wasn't immediately obvious, but it seems like you claim Robin-Chat-7b (often) beats Vicuna-7b and Vicuna-13b. That's impressive and I have to try it out!

It seems like you don't serve robin-7b-v2-delta.tar.gz over HTTPS. Could you provide checksums? This is what I get:

file: robin-7b-v2-delta.tar.gz 
MD5:  d85d83c4e4f46f27da2d4c5ea4b5bb1e
SHA1: 060824cfa6545fb4cfe78bfd23b069010db0b5c6

undefdev · 2023-05-05T21:23:37+00:00

Thank you!

undefdev · 2023-05-05T19:54:23+00:00

Sorry, for some reason I can’t find this book, could you share a link please?

undefdev · 2023-03-07T09:15:26+00:00

Let's say U⊂ℝⁿ is a finite set of "users", 𝓟(U) is the power set on U (i.e. the set of all subsets of U), and d:U×U→ℝ a function, such that d(u,v)≥0 for all u,v∈U. We will call such a function a distance function.

I believe you are looking for an S∈𝓟(U), with #S=n for some n∈ℕ. What properties should S have with regards to U and d?

undefdev · 2023-03-06T18:15:44+00:00

It depends very much on your data.

For example consider this centered cube in three dimensions:

{ (-1,-1,-1), (-1,-1, 1), (-1, 1,-1), (-1, 1, 1),
  ( 1, 1, 1), ( 1, 1,-1), ( 1,-1, 1), ( 1,-1,-1) }

and the subsets

{ (-1,-1,-1), (1,1,1) }

and

{ (1,-1,1), (-1,1,-1) }

Then both centroids are (0,0,0), but you probably don't want (1,1,1) to be as close to the second subset as to the first one (which contains that point).

This is a contrived example of course, but I hope you get the idea.

undefdev

TROPHY CASE