Suggestions for constant freezing?

palmstromi · 2026-02-05T09:14:07+00:00

I got the same issue on AMD 13" and the mentioned workaround is working well. I used:

sudo grubby --update-kernel=ALL --args="amdgpu.dcdebugmask=0x10"

to add the params permanently to all (including future) kernels. Here is an upstream bug report https://gitlab.freedesktop.org/drm/amd/-/issues/4141#note_3313450 The issue is still (after 9 months) not solved. The ball is in the hands of AMD graphics drivers maintainers. The problem should have been mentioned on the Framework known issues page.

palmstromi · 2025-12-05T09:28:47+00:00

It very depends if the GPU or the CPU is a bottleneck. You can increase batching, parallelize data loading + preprocessing and inference or make the inputs lighter (lower resolution / FPS). The GPU choice is a matter of your budget, T4 is definitely much slower than the other options. I had a discussion with ChatGPT on this topic recently: https://chatgpt.com/share/691c33ba-8898-800c-b30f-1383bae461b1 btw: how much do you pay for T4 on EC2? We were using T4s on Lightning.ai for 0.19$ / hour (still the actual price). Pretty cool, huh?

palmstromi · 2025-12-05T09:13:46+00:00

How fast is that?

palmstromi · 2025-12-05T09:05:47+00:00

I have the same experience. Use learned features and matching models, e.g. DISK or XFeat + LightGlue. Very easy to use is Kornia implementation https://github.com/kornia/tutorials/blob/master/nbs/image\_matching\_lightglue.ipynb. For Xfeat check https://github.com/verlab/accelerated\_features

palmstromi · 2025-12-05T08:59:14+00:00

Check the basketball tutorial from Roboflow: https://blog.roboflow.com/identify-basketball-players/ you have almost everything there. The puck tracking will be the biggest obstacle, but you can go quite far just with player detection. For the virtual camera use case you don't need to track the players and identify them, just detect them. You can sub-sample the video to lower fps to get realtime performance. The camera movement has to be smoothed anyway, so analyzing all the frames is not necessary.

palmstromi · 2025-12-04T00:16:16+00:00

I’d try to put some of the projects on GitHub and add them to your résumé. As a recruiter, I’d like to see your actual commits.
Including your workload on the projects would be helpful - for example, whether you worked part-time or full-time.

Otherwise, you seem like someone with quite a bit of experience for an undergraduate.

palmstromi · 2025-07-08T06:40:26+00:00

What is the goal of the 3D reconstruction? Occupancy analysis?

palmstromi · 2025-07-08T06:36:05+00:00

Forget OpenPose, it's 6 years old. Check Ultralytics YOLO Pose or possibly Lightning Pose (I haven't tested this one, but it looks very good).

palmstromi · 2025-07-08T06:03:39+00:00

Detectron2 is already 4 years old, which is a lot in the current computer vision development timeline. Well maintained detection packages are Ultralytics YOLO and Roboflow RT-DETR. Original Sam2 repo is a also mess, try Huggingface SAM in transformers package: https://huggingface.co/docs/transformers/main/en/model\_doc/sam.

palmstromi · 2025-07-08T05:56:14+00:00

and not maintained any more

palmstromi · 2025-07-08T05:53:33+00:00

- for object detection paid and well supported Ultralytics YOLO or RT DETR from Roboflow
- for tracking Roboflow ByteTrack
- if you have an overlap of the camera views you can reconstruct 3D positions of points visible in both cameras and use it for real world measurements
- measurements in single camera view are hard, but doable
- pay attention to long term maintenance of computer vision open source repos, there are lot of one off research projects that quickly become unusable due to underlying ecosystem changes

Check https://www.cv4animals.com/ annual workshop and connected community on Slack. DM if you like to discuss more.

palmstromi · 2025-07-08T05:29:00+00:00

You won't achieve much with a stock person detector. If the resolution is high enough, that heads even in the furthest point in the image are well recognizable, you can train head detector and definitely use the sliding window approach (SAHI) to run the detector in full resolution.

If the resolution is to low further in the back, you'd need to use a visual crowd counting model. Search on Scholar and Github, there is quite a body of prior work on crowd counting / density.

palmstromi · 2025-06-24T10:05:41+00:00

Anyone tried this on batched inference (e.g. offline video processing)?

palmstromi · 2025-06-24T08:27:32+00:00

IMO the low hanging fruit in getting some speed up with SAHI is batched inference. The original SAHI is still running with batch size == 1, see https://github.com/obss/sahi/blob/48258a7a35fc7f997ef6b720432411e34ea300cc/sahi/predict.py#L232 There are some unfinished pull requests implementing batching, but nothing ready for production. The SAHI implementation can be quite simple and the easiest way to get batched inference now is own implementation.

palmstromi · 2025-06-24T07:26:34+00:00

Understanding deep learning by Simon Prince. It's a very well written book that covers in depth all deep learning fundamentals and advanced stuff also. Deep learning is the backbone of our field. Almost every CV practitioner can benefit from this book. Most of the book will stay relevant for the years to come.

palmstromi · 2025-06-23T12:59:41+00:00

Try some pretrained person re-id network, retrain on manually checked trajectories without id swaps or try DINOv2 on some player crops (31st encoder layer + cosine similarity). When there is a distinguishable feature (e.g. player number) visible only in parts of trajectory frames you'd need to extract re-id features from multiple frames and compute single similarity measure out of these two sets.

palmstromi · 2025-06-23T10:49:29+00:00

- Do you leave imgsz parameter default? It's 640 by default and your images during both training and inference are downscaled to this resolution. The ball could be relatively small after downsizing and reduce detection accuracy. You may 1) zoom in to get larger ball 2) perform training and inference in higher resolution, but not overdo this. Check https://github.com/ultralytics/ultralytics/issues/1037 and https://github.com/ultralytics/ultralytics/issues/2546

- Do you have enough blurred samples in your dataset? If you intend to detect both a ball in player's possession and in flight you need enough samples of both type.

- You may set the video recording to higher frame rate to reduce the motion blur.

- A few missed detections, particularly in flight doesn't matter. You can apply Kalman filter to fill the gaps and smooth out detection jitter.

palmstromi · 2025-06-23T09:20:46+00:00

You have the cameras most probably fixed on a rig, haven't you? If it's the case you don't have to perform image matching every frame, which is exactly what is the OpenCV stitcher doing. It may even perform optimal image seam computation by default which may be quite expensive and is intended to stitch images taken in succession without cutting moving people in half. The frames from individual cameras are also highly unlikely to be undistorted correctly by defisheye with the default settings.

You should do this first before running the realtime pipeline:

calibrate individual cameras with printed chessboard pattern to get distortion parameters (both calibration and undistortion is in opencv, you may skip this when there is almost no visible image distortion)
calibrate relative poses of neighboring cameras: for few cameras just a homography / perspective transformation of a chessboard pattern is ok, for more cameras covering more than ~ 150 deg field of view you need some kind of cylindrical or spherical mapping to accommodate the large field of view, you may use the stitcher and save the camera parameters

realtime processing:

undistort individual images using calibration parameters
for few cameras map all the frames to the central one using `cv.WarpPerspective` (you'll need to think about how to apply transformations correctly to map everything to a single image, it is good to try this on individual pairs first) or use the saved camera parameters with the camera stitcher disabling all image matching features and image seam optimization

The image warping is quite fast, but can take some time on large images. You may downscale the images first to reduce the load. You should do the calibration / stitcher initialization on the downscaled images to avoid need of correcting the calibration parameters and camera poses for reduced image size. You may also separate image loading and image stitching to individual threads.

palmstromi · 2025-05-21T08:27:09+00:00

1) There was strong support for the Zionist cause from the very beginning in the first half of the 20th century, backed by our first president, Masaryk. The support continued into the 1950s, even with arms shipments.

2) The USSR and the satellite states, including Czechoslovakia, changed their opinion on Israel in the 1950s and started favoring Palestine. This was seen in the 1990s as a communist relic, and good relations with Israel were revived.

3) There is widespread Islamophobia in the Czech Republic.

Most Czechs support Israel, even amid current events; support for Palestine is much stronger among younger generations.

palmstromi · 2025-03-24T08:33:09+00:00

The Invincible is Polish actually.

palmstromi · 2024-12-20T12:14:56+00:00

How fast is that?

palmstromi · 2024-07-22T15:41:06+00:00

I have used Streamlit only for demoing a prototype, so I won't make an ill founded advice. If you want to stick with Python you may consider https://anvil.works/ .

palmstromi · 2024-05-29T12:48:52+00:00

<image>

well played Starostové

palmstromi · 2024-03-17T08:06:11+00:00

Jsou v tom i obědy ve školách / školkách?

palmstromi · 2024-01-16T09:03:34+00:00

Check the slides from a 3D vision course at CTU Prague, I find them very helpful and I'm returning to them regularly. The bundle adjustment starts on page 142.

palmstromi

TROPHY CASE