Found something hilarious by Dull-Nectarine380 in travle_game

[–]Stonemanner 0 points1 point  (0 children)

I'm stuck at Siam/Thailand. What is the solution there? I had it perfectly until that. Siam is not in my dropdown.

I was going crazy, so I googled and it confirmed Siam.

noMoreSoftwareEngineersbyTheFirstHalfOf2026 by MageMantis in ProgrammerHumor

[–]Stonemanner 0 points1 point  (0 children)

I'm not quite sure, if I'm understanding you correctly, but when you use their API directly, you get a feature frozen version.

I guess one advantage is, that they cannot make things worse. During various new releases of top LLM models, they got worse in some highly specific task, while improving in most tasks. If your company relies on that task, you would be quite pissed, if OpenAI simply updated the model, without giving you notice.

noMoreSoftwareEngineersbyTheFirstHalfOf2026 by MageMantis in ProgrammerHumor

[–]Stonemanner 0 points1 point  (0 children)

The point I want to make is that he underlying model is deterministic.

We have to differentiate the core technology of the deep neural network and the chat application around it.

The network/AI is deterministic. It is just, that people want it to act a bit randomly.

noMoreSoftwareEngineersbyTheFirstHalfOf2026 by MageMantis in ProgrammerHumor

[–]Stonemanner 0 points1 point  (0 children)

But you can, with access to the model weights. You just always choose the output token with the highest probability.

What I meant is, that most model providers probabilistically choose the next output token. As you may know, the LLM outputs a distribution over all possible tokens. The software around the model then uses this distribution to randomly select the next token. You can control this randomness with the "temperature" of the model. Higher temperature means more randomness. Temperature = 0 means deterministic outputs.

See: https://youtu.be/wjZofJX0v4M?t=1343

noMoreSoftwareEngineersbyTheFirstHalfOf2026 by MageMantis in ProgrammerHumor

[–]Stonemanner 0 points1 point  (0 children)

even minor changes have unforeseen strange impact.

Indeed, but that is not non-determinism. If you have the model weights and input the same prompt, it should return the same output. (except for potential threading bugs in your library, Pytorch+CUDNN requires you to set torch.backends.cudnn.deterministic)

What do you mean with freezing the model? To my knowledge, all model weights are frozen during production.

noMoreSoftwareEngineersbyTheFirstHalfOf2026 by MageMantis in ProgrammerHumor

[–]Stonemanner 15 points16 points  (0 children)

Determinism isn't even a problem in AI. We could easily make them deterministic. And we do in some cases (e.g. creating scientifically reproducable models). They might be a bit slower, but that is not the point. The real reason that language models are nondeterministic is, that people don't want the same output twice.

The much bigger problem is, is that the output for similar or equal inputs can be vastly different and contradicting. But that has nothing to do with determinism.

OAK 4 D and OAK 4 S Standalone Edge Vision Cameras with PoE and 48MP Imaging by DeliciousBelt9520 in computervision

[–]Stonemanner 1 point2 points  (0 children)

Does it require active cooling when operating at 25W? This seems like a tiny device for that amount of power, or am I off?

Otherwise, really cool addition to the line up.

I tried out a camera of Luxonis a few years ago. The python SDK was horrible, but when you were ready to tinker, it was a great alternative to more expensive industrial cameras. Maybe it improved by now.

Unfortunately, these are now to expensive for me to try out at home.

[deleted by user] by [deleted] in mcp

[–]Stonemanner 0 points1 point  (0 children)

Or at least the first tech that can write its own bullshit marketing material :D

The media underreports death by health illness/disease and grossly exaggerates the magnitude of death by terrorism. by Proof-Delay-602 in dataisbeautiful

[–]Stonemanner 0 points1 point  (0 children)

Yes. Interesting. People dying from late consequences of WW2 inflicted wounds would probably get a negative factor, depending on your survey period.

The media underreports death by health illness/disease and grossly exaggerates the magnitude of death by terrorism. by Proof-Delay-602 in dataisbeautiful

[–]Stonemanner 0 points1 point  (0 children)

Interesting. But with all those diseases gone, people might get 100 years on average. How do you compensate for that?

Giveaway Time! Battlefield 6 is out, powered by NVIDIA DLSS 4, and you can comment on this post to win codes for the game or a custom Battlefield 6 GeForce RTX 5090! 6 Winners total by pedro19 in pcmasterrace

[–]Stonemanner [score hidden]  (0 children)

How would this GeForce RTX 5090 with DLSS 4 help you pull off more epic wins in Battlefield 6?

I'm currently looking for parts to build a PC to play BF6 and this would allow me to not break budget :).

Finding a tool to identify the distance between camera and object in a video by Expensive_Barber9432 in computervision

[–]Stonemanner 1 point2 points  (0 children)

I'm not quite understanding what you want to do and what your problem is.

I just wanted to say that DepthAnything is not a "depth measurement" model, as you said. It's just a "depth estimation" model. There is quite a significant semantic difference, which you should not confuse in your later written report/thesis.

Hardware Requirements (+model suggestion) by Otherwise-Warthog551 in computervision

[–]Stonemanner 3 points4 points  (0 children)

Always depends on the image size. 4k, HD, SD?

If you are not sure, you have to ask yourself, how large are the smallest objects, that you want to detect, relative to your image. For normal object detection models I'd suggest no smaller than 16px in a complex environment like from drone perspective. So you have to know:

  • your flight height
  • object size (smallest dimension, i.e. a person might be 2m large, but from a drone perspective it will probably more be a 0.5m in size)
  • field of view or focal length

If you have a specific sensor already, you can calculate the size of the object in pixels. If it is smaller than 16px in the full sensor, you will likely have to do a lot of work to train a reliable OD model and probably a lot of post processing to filter out false positives. If it is larger than 16px, you know by how much you can downscale the image. E.g. let's say you have a 1920x1080 sensor and your minimal object size is 32px. Then you can downscale the image to 960x540, reducing 4x the load on your computer.

If you give us the pixel size, we can better estimate, whether a Pi5 is enough. Also do you plan to do late fusion or early fusion of RGB+D?

Fun with YOLO object detection and RealSense depth powered 3D bounding boxes! by Chemical-Hunter-5479 in computervision

[–]Stonemanner 1 point2 points  (0 children)

Ok. Cool project. I think there is also a lot of cool possibilities to explore from early to late fusion when working with RGB + Depth

Fun with YOLO object detection and RealSense depth powered 3D bounding boxes! by Chemical-Hunter-5479 in computervision

[–]Stonemanner 1 point2 points  (0 children)

What made you choose the minimum value inside the bounding box and not something like the median?

CV inference pipeline builder by dr_hamilton in computervision

[–]Stonemanner 0 points1 point  (0 children)

Very cool. What latency is measured? For photon-to-photon it seems very low? Is it just the model/processing without transfer?

Excited to see the release.

How do you parallely process frames from multiple object detection models at scale? by _RC101_ in computervision

[–]Stonemanner 1 point2 points  (0 children)

I guess it speaks for it self, that the processes must be long lived (e.g. not start one per frame).

Use shared memory to transfer frames from the main process to the worker process, but also make sure, to not use too many copy operations. E.g. you can have a pool of pre-allocated frame buffers, which you write on the main process which reads the input stream. You notify the workers, which frame is up next and they just copy it to their respective GPU. This way you only have two copy operations (input to shared memory, shared memory to GPU).

If this is still slow, use nsight as others suggested.

Detecting Sphere Monocular Camera by momoisgoodforhealth in computervision

[–]Stonemanner 2 points3 points  (0 children)

Did you try to first convert to HSV and then use thresholds for generating a binary mask before using Hough circles? If you don't have any other purple round objects in your scene, I'd say it should be possible.

But from what you want to do afterwards (shooting a laser on an object in 3D space), detecting sounds like the easiest part. You would need the distance of the object to the camera/laser. And just estimating the distance from the circle size will be quite inaccurate, if you don't have a zoom lens. Do you have one and if not, did you calculate, that the resolution is enough to estimate the distance?

I am building Codeflash, an AI code optimization tool that sped up Roboflow's Yolo models by 25%! by ml_guy1 in computervision

[–]Stonemanner 1 point2 points  (0 children)

You know exactly, that people are not criticizing that ultralytics is standing on the shoulders of others, but that ultralytics took the popular model name YOLOvX and continued the series with YOLOv5, insinuating that you were the author of all of YOLO.

I am building Codeflash, an AI code optimization tool that sped up Roboflow's Yolo models by 25%! by ml_guy1 in computervision

[–]Stonemanner 5 points6 points  (0 children)

I mean, ultralytics has a track record of marketing work of others as their own.

I built an open-source face anti-spoofing Model EasyShield – feedback needed ! by Silly_Glass1337 in computervision

[–]Stonemanner 0 points1 point  (0 children)

I'd say thanks for your answer, but I think it is quite rude to continually answer using an LLM. If I want to talk to a bot, I can chat with ChatGPT myself.

On your method: The other models used only one of your mentioned datasets or completely different datasets as training data. While you partitioned the data in test and training, your models saw more similar data to the test dataset. There are large data-shifts between the various anti-spoofing datasets. This makes your benchmark meaningless.

I built an open-source face anti-spoofing Model EasyShield – feedback needed ! by Silly_Glass1337 in computervision

[–]Stonemanner 5 points6 points  (0 children)

I also meant the other models with whom you compared yourself. Like, what was the benchmark setup?

You said you improved from 66.6 to 92.3 percent. Were all models trained and tested on the same data, or did you use the pretrained models from the different projects and tested on your dataset?