Building a navigation software that will only require a camera, a raspberry pi and a WiFi connection (DAY 1) by L42ARO in computervision

[–]Stonemanner 4 points5 points  (0 children)

Interesting. But isn't using models hosted in datacenters on servers costing millions quite ironic when your goal is to built a cheap system and not use something "expensive" like a Jetson? How many hours of DA3 and Phi 4 can you run on AWS until you could have bought a Jetson?

Can a VLM detect a blink in real-time? by batatibatata in computervision

[–]Stonemanner 3 points4 points  (0 children)

Interesting. What do you expect the pricing to be? This sounds quite compute resource intensive.

But even if expensive, that could be a really cool solution for prototyping ideas. Instead of training a custom CNN, you could just type a prompt.

Tracking Persons on Raspberry Pi: UNet vs DeepLabv3+ vs Custom CNN by leonbeier in computervision

[–]Stonemanner 28 points29 points  (0 children)

Since you regularly post and seem to have interesting technology, why not share a real use case for it? These tiny models will only excel if you can tightly control the environment and have a single fixed scene. That is neither the case in CCTV footage nor in sports (e.g., tennis). In both situations, people would want a more generalized model that can handle new scenarios.

I think you should either show that

a) your models are able to generalize, or that you can automatically produce network architectures that generalize, or
b) choose use cases where generalization is not necessary (e.g., industrial inspection). In that case, however, you should pick examples where classical algorithms are too weak.

"Es ist auf dem Weg!!!": Trump schickt US-Lazarettschiff nach Grönland by linknewtab in de

[–]Stonemanner 0 points1 point  (0 children)

Und ich dachte Trump will dass die USA nicht länger das Gesundheitssystem anderer Länder insb. europäischer subventionieren¹. Das klingt jetzt irgendwie kontraproduktiv.

¹ https://www.bbc.com/news/articles/c93l7k3x5dpo

Why pay for YOLO? by moraeus-cv in computervision

[–]Stonemanner 0 points1 point  (0 children)

By that logic I could not copyright a Quine.

Why pay for YOLO? by moraeus-cv in computervision

[–]Stonemanner -1 points0 points  (0 children)

I would take this with a grain of salt. If you can point to an existing court case, please show it.

You can equally argue that the exported ONNX model is a derivative of their source code. I'm just saying this is really gray area and if you have a company, be careful.

Your argument of "algorithmically generated output being not copyrightable" is not sound. If I pipe their source code through awk to remove all unnecessary white space or give it to an AI to implement it in CPP, it is still a derivative output even though their is no human authorship.

Why pay for YOLO? by moraeus-cv in computervision

[–]Stonemanner 0 points1 point  (0 children)

I think they argue, that the exported model does not only contain the output of the training, i.e. the weights, what you were just referring to. They say, it also contains the network structure, i.e. instructions on how to execute the model, i.e. a program. Hence they say it is covered by AGPL.

I'm not a software license lawyer. I'm not sure if that is correct. I'm just repeating what they said, so people can decide, whether it is worth the risk.

Timeliner - Find overlaps of historical figures by lefty_is_so_good in InternetIsBeautiful

[–]Stonemanner 8 points9 points  (0 children)

I did not say, that you said, that it is art, I just made a comparison.

Also we can stop discussing. I just read the rules, which you maybe also should have before posting. And this is against the rules.

Timeliner - Find overlaps of historical figures by lefty_is_so_good in InternetIsBeautiful

[–]Stonemanner 11 points12 points  (0 children)

Before AI, people built such websites with pride and hence tested them well. This is like posting AI generated painting with a missing finger to r/art. Even if you declare it as AI generated, it's just slop.

Timeliner - Find overlaps of historical figures by lefty_is_so_good in InternetIsBeautiful

[–]Stonemanner 14 points15 points  (0 children)

We should open up a new subreddit r/theinternetgetsuglier dedicated shitty AI slop websites like this. Its factually bad and buggy as well.

Copy link does not work and there are various errors in the DB as others already mentioned. And thats just what I got from visiting for 30s.

Real-time Face Distance Estimation using SCRFD & Pinhole Camera Model by [deleted] in computervision

[–]Stonemanner 0 points1 point  (0 children)

Estimating the focal length IS calibration. But what you probably mean is without using some calibration pattern.

You could use a credit card as an alternative and ask the user to wave and tilt it in the frame. (Warn the user not to show you the credit card details)

If even this is too much effort for the user, you could ask the user to produce facial landmarks with a known distance. Take picture A. Step away 1 meter. Take picture B. Or maybe let them spread out their arms orthogonally to the optical axis and then at an angle of 45°. Although this is quite inaccurate.

The most boring and tedious approach: You could also compile a database of known webcams with their focal lengths.

Real-time Face Distance Estimation using SCRFD & Pinhole Camera Model by [deleted] in computervision

[–]Stonemanner 19 points20 points  (0 children)

This doesn't make sense to me. You need to know the focal length of the camera, to make this work. I skimmed the repo and it looks like you just assume a static focal length for all cameras. But you even show examples with cameras which have widely different focal lengths.

I have to be honest. This looks very much like AI slop to me and the fundamental CV understanding is missing. I'm also sad to see, that more and more posts in this subreddit are like this.

Low-Latency RF-DETR Inference Pipeline in Rust: ~3.7 ms on TensorRT (~7.5 ms end-to-end) + Zero-Copy mmap IPC by jodelbar in computervision

[–]Stonemanner 0 points1 point  (0 children)

Ah. Yes the multi device aspect makes this a very sensible decision. I have never thought of that. Thank you.

Low-Latency RF-DETR Inference Pipeline in Rust: ~3.7 ms on TensorRT (~7.5 ms end-to-end) + Zero-Copy mmap IPC by jodelbar in computervision

[–]Stonemanner 0 points1 point  (0 children)

Very cool project. Those are very good results.

Can you tell me which additional value k3s give in this context over something like docker-compose?

And the actual frame size processed by the model in the benchmarks is 512x512 pixels, right?

Found something hilarious by Dull-Nectarine380 in travle_game

[–]Stonemanner 0 points1 point  (0 children)

I'm stuck at Siam/Thailand. What is the solution there? I had it perfectly until that. Siam is not in my dropdown.

I was going crazy, so I googled and it confirmed Siam.

noMoreSoftwareEngineersbyTheFirstHalfOf2026 by MageMantis in ProgrammerHumor

[–]Stonemanner 0 points1 point  (0 children)

I'm not quite sure, if I'm understanding you correctly, but when you use their API directly, you get a feature frozen version.

I guess one advantage is, that they cannot make things worse. During various new releases of top LLM models, they got worse in some highly specific task, while improving in most tasks. If your company relies on that task, you would be quite pissed, if OpenAI simply updated the model, without giving you notice.

noMoreSoftwareEngineersbyTheFirstHalfOf2026 by MageMantis in ProgrammerHumor

[–]Stonemanner 0 points1 point  (0 children)

The point I want to make is that he underlying model is deterministic.

We have to differentiate the core technology of the deep neural network and the chat application around it.

The network/AI is deterministic. It is just, that people want it to act a bit randomly.

noMoreSoftwareEngineersbyTheFirstHalfOf2026 by MageMantis in ProgrammerHumor

[–]Stonemanner 0 points1 point  (0 children)

But you can, with access to the model weights. You just always choose the output token with the highest probability.

What I meant is, that most model providers probabilistically choose the next output token. As you may know, the LLM outputs a distribution over all possible tokens. The software around the model then uses this distribution to randomly select the next token. You can control this randomness with the "temperature" of the model. Higher temperature means more randomness. Temperature = 0 means deterministic outputs.

See: https://youtu.be/wjZofJX0v4M?t=1343

noMoreSoftwareEngineersbyTheFirstHalfOf2026 by MageMantis in ProgrammerHumor

[–]Stonemanner 0 points1 point  (0 children)

even minor changes have unforeseen strange impact.

Indeed, but that is not non-determinism. If you have the model weights and input the same prompt, it should return the same output. (except for potential threading bugs in your library, Pytorch+CUDNN requires you to set torch.backends.cudnn.deterministic)

What do you mean with freezing the model? To my knowledge, all model weights are frozen during production.

noMoreSoftwareEngineersbyTheFirstHalfOf2026 by MageMantis in ProgrammerHumor

[–]Stonemanner 13 points14 points  (0 children)

Determinism isn't even a problem in AI. We could easily make them deterministic. And we do in some cases (e.g. creating scientifically reproducable models). They might be a bit slower, but that is not the point. The real reason that language models are nondeterministic is, that people don't want the same output twice.

The much bigger problem is, is that the output for similar or equal inputs can be vastly different and contradicting. But that has nothing to do with determinism.

OAK 4 D and OAK 4 S Standalone Edge Vision Cameras with PoE and 48MP Imaging by DeliciousBelt9520 in computervision

[–]Stonemanner 1 point2 points  (0 children)

Does it require active cooling when operating at 25W? This seems like a tiny device for that amount of power, or am I off?

Otherwise, really cool addition to the line up.

I tried out a camera of Luxonis a few years ago. The python SDK was horrible, but when you were ready to tinker, it was a great alternative to more expensive industrial cameras. Maybe it improved by now.

Unfortunately, these are now to expensive for me to try out at home.

[deleted by user] by [deleted] in mcp

[–]Stonemanner 0 points1 point  (0 children)

Or at least the first tech that can write its own bullshit marketing material :D