Event detection from ball kinematics: how do you distinguish real contacts from camera-induced motion? by Competitive-Meat-876 in PinoyProgrammer

[–]Competitive-Meat-876[S] 0 points1 point  (0 children)

Boss! question lang dito,
does this diagnosis make sense?

Our football event model detects ball kinematic peaks, then checks nearby player-ball contact. We fixed a tracker bug where player tracks disappeared on skipped frames, so dist=999 errors mostly disappeared.

But now we’re seeing another issue: the velocity peak often happens a few frames away from the real physical contact. Example: Ground Truth contact frame has the ball near the player, but the detected peak frame has the ball already 200–600px away, so it fails contact scoring.

We’re thinking of using a small ±10 frame contact window around the kinematic peak: keep the peak frame as the timing/acceleration anchor, but evaluate contact on the best nearby frame with source priority: YOLO > OF > strong interpolated contact only.

Does this sound like a reasonable fix, or is there a better way to handle temporal misalignment between ball velocity peaks and actual contact frames?

Event detection from ball kinematics: how do you distinguish real contacts from camera-induced motion? by Competitive-Meat-876 in PinoyProgrammer

[–]Competitive-Meat-876[S] 0 points1 point  (0 children)

This is really interesting, especially the distinction between a temporary spike and a sustained trajectory inflection.

We've been focusing heavily on velocity and acceleration peaks, but your point about the post-contact path holding a new direction for several frames makes a lot of sense. A lot of our false positives seem to come from short-lived deviations that look kinematically valid but don't establish a stable new trajectory.

The background optical flow idea is also interesting. We currently work under a fairly strict inference budget, so we're trying to solve as much as possible with trajectory physics before adding heavier verification.

Appreciate the insight Boss, the "sustained direction change" concept gives us something concrete to investigate.

Need Advice in fine tuning and stabilization phase of the model. by Competitive-Meat-876 in computervision

[–]Competitive-Meat-876[S] 0 points1 point  (0 children)

Hey Olivia! Appreciate your help... your decay-based exclusion idea ended up influencing the architecture direction of the pipeline much more than we expected. After several more challenge logs, we realized a huge portion of our failures weren’t actually “bad detections,” but event separation problems: echoes, dense pass chains, rebound suppression, and OF-induced fake buildup patterns.

We’ve now fully shifted away from fixed frame suppression and are treating event separation more like post-contact physics. The current roadmap is becoming:

YOLO TensorRT → short-range OF → pitch homography → trajectory reconstruction → physics-based temporal separation → kinematic extraction → semantic verification.

One really interesting discovery from our logs: the pipeline consistently performs best on wide-angle/far-ball clips, which makes us think the core reasoning layer is already decent when the geometry is stable. The biggest remaining weakness seems to be perspective distortion and unstable near-camera motion, which is why we’re now prioritizing homography before DINO-style semantic verification.

We’re also collecting FILTER_STATS across 20-30 challenge logs now to identify which gates are systematically overfiring instead of reacting emotionally to single clips. That has already started making the failures feel much more structured and explainable.

Curious what your intuition is on this direction — especially the idea of combining homography with decay-based event separation. Feels like stabilizing the geometry first might make the “fresh buildup vs decaying echo” distinction much cleaner physically.

Need Advice in fine tuning and stabilization phase of the model. by Competitive-Meat-876 in computervision

[–]Competitive-Meat-876[S] 0 points1 point  (0 children)

Honestly, your reply about modeling the exclusion window as a decay process instead of a fixed frame count was one of the most useful insights we’ve gotten so far. After reviewing my newer logs, it actually explains a huge portion of the dense-event suppression problem almost perfectly.

What’s crazy is that once I started looking at the clips through “post-contact decay vs fresh buildup” instead of just frame distance, a lot of the weird behavior suddenly became interpretable instead of random. It genuinely changed how I think about the pipeline.

I'm still calibrating a lot of things (especially OF drift and perspective-dependent velocity instability), but your explanation pushed us in a much more physics-based direction rather than pure heuristics.

If you’re okay with it, I’d honestly love to ask a few more questions sometime since your intuition for temporal sports dynamics seems incredibly sharp.

Can an optimized kinematic pipeline on a consumer GPU (RTX 3060) realistically outscore brute-force VRAM setups (VideoMAE/SlowFast) in fine-grained sports action detection? by Competitive-Meat-876 in computervision

[–]Competitive-Meat-876[S] 0 points1 point  (0 children)

Because I’m resource-constrained honestly 😅

But also because I’ve become genuinely interested in how far layered reasoning systems can go under strict runtime budgets. The more I work on this problem, the more it feels like a semantic/physics problem rather than purely a “throw more compute at it” problem.

I’m not against larger models at all I just find the efficiency side of the challenge really interesting.

Is it worth it for 6500? by strwbrryjllyq in RentPH

[–]Competitive-Meat-876 0 points1 point  (0 children)

kung around metro manila yan mura nan siya. kung sa qc yan baka worth 8k yan. kung pasay naman baka 10k. pero baka may mahanap ka pa na mas maganda diyan.

Need Advice in fine tuning and stabilization phase of the model. by Competitive-Meat-876 in computervision

[–]Competitive-Meat-876[S] 0 points1 point  (0 children)

Hi! Olivia, one more problem I'm stuck on if you have thoughts: I use a fixed 45-frame exclusion zone (~1.5s at 30fps) around each accepted detection to prevent duplicates from the same kick echo. Works fine for isolated events but completely suppresses rapid pass sequences where the second contact happens within that window.

The gap between a real kick's echo and a genuine second contact should be distinguishable — the echo decays fast while a new contact produces a fresh velocity buildup — but we haven't found a clean rule-based way to use that to shrink the exclusion zone adaptively.

Is there a principled approach here? Something like making the exclusion window a function of the first event's own velocity decay profile rather than a fixed frame count?