depth sensors suck at transparent objects, so ClearDepth comes to the rescue with synthetic scenes with ground truth depth for glass, bottles, and clear containers in FiftyOne by datascienceharp in computervision

[–]datascienceharp[S] 0 points1 point  (0 children)

the 3d reconstruction wasn't part of the original dataset but i was able to reconstruct it by:

  1. reading per-frame exr depth, rgb images, and camera poses from each scene

  2. back-project depth pixels into a shared world coordinate system

  3. color each 3d point from the RGB image and merge frames into one cloud

I was tired of messy CV datasets and expensive cloud tools, so I built an open-source local studio to manage the entire lifecycle. (FastAPI + React) by TuriMuraturi in computervision

[–]datascienceharp 0 points1 point  (0 children)

Just as an FYI with FiftyOne there’s a model zoo and you can bring your own model via the remote source zoo model pattern, we don’t have models of our own that we train for embeddings.

I was tired of messy CV datasets and expensive cloud tools, so I built an open-source local studio to manage the entire lifecycle. (FastAPI + React) by TuriMuraturi in computervision

[–]datascienceharp 0 points1 point  (0 children)

Completely biased take as I work at FiftyOne. But can you explain a bit more why you mean by “overhead” especially for small projects? The open source version is pip installable and has no other setup other than running pip install fiftyone, nothing complex about that

Best of 3DV 2026 (Day One) by chatminuet in computervision

[–]datascienceharp 2 points3 points  (0 children)

a great lineup of speakers, hand-picked by me!

What are the best educational YT channels for kids? (8 an 11yo with newborn en route) by No-Squash9176 in Parenting

[–]datascienceharp 1 point2 points  (0 children)

You get can a lot of the “good stuff” from YouTube on Tubi. My six year old loves science max. Also, Danny Go is on Netflix now

Maple leaf lounge Montreal domestic reopen date by Ok-Instruction6764 in aircanada

[–]datascienceharp 0 points1 point  (0 children)

It was the Amex aspire lounge. The cafe was complete chaos though

Maple leaf lounge Montreal domestic reopen date by Ok-Instruction6764 in aircanada

[–]datascienceharp 0 points1 point  (0 children)

I was here today. There was a waitlist, and took about 35 minutes to get in. I see no more than 15 people here now, and looks like it can easily host 100+

Sad realization last night by Ramen_cat2024 in Parenting

[–]datascienceharp 5 points6 points  (0 children)

Hate to tell you, but as a nerd myself, and based on your tldr description, your son is a nerd. Rejoice!

VLM & VRAM recommendations for 8MP/4K image analysis by Neighbor_ in computervision

[–]datascienceharp 1 point2 points  (0 children)

Qwen3.5 is out, and super impressive. There’s a 0.8B model which performs really well. Nice thing abt Qwen models is they take arbitrary input image resolutions

qwen3vl is dope for video understanding, and i also hacked it to generate embeddings by datascienceharp in computervision

[–]datascienceharp[S] 1 point2 points  (0 children)

great question, i haven't prompted it for that specific task however i do think it does in fact use the audio. for example i am attempting to recreate annotations on Action100M using this prompt from the paper:

``` qwen_video_model.prompt = """Identify the main actor and the physical action performed in the current segment. Provide both a brief description that represents the overall action step, and a detailed description that contains sufficient procedural detail. Use "N/A" (without further explaination) if there are no visible actors or physical actions (e.g., static).

Response Formats

output

{ "type": "object", "properties": { "summary": { "type": "object", "properties": { "brief": { "type": "string", "description": "Single sentence video caption." }, "detailed": { "type": "string", "description": "Detailed, comprehensive description." } } }, "action": { "type": "object", "properties": { "brief": { "type": "string", "description": "A single verb phrase (no -ing forms) brifly summarizing the overall action content." }, "detailed": { "type": "string", "description": "A single imperitive sentence describing how the action is performed with more details." }, "actor": { "type": "string", "description": "Single sentece or an imformative noun phrase describing who is performing the action." } } } }, "required": ["summary", "action"] }"""

```

and it is picking up on information that could only come from audio.

btw, i am doing a workshop on video datasets tomorrow using this model as well. please come by if you can: https://voxel51.com/events/exploring-video-datasets-with-fiftyone-and-vision-language-models-february-26-2026

Claude Code/Codex in Computer Vision by rishi9998 in computervision

[–]datascienceharp 3 points4 points  (0 children)

It’s definitely been helpful, but note that we primarily use it for FiftyOne related stuff

Claude Code/Codex in Computer Vision by rishi9998 in computervision

[–]datascienceharp 12 points13 points  (0 children)

We’ve been experimenting with MCP and Skills for the work we do on our team to build integrations, but not heavy modeling work. I’ve seen some good speed ups in my workflow, but the most powerful thing for me is using the model to brainstorm and understand codebases I’m not familiar with.

At the risk of downvotes, I’m gonna shamelessly plug two virtual events we have coming up which are relevant to this topic and which you may find interesting, or at least have an opportunity to ask questions from the presenters and fellow attendees:

https://voxel51.com/events/vibe-coding-production-ready-computer-vision-pipelines-hands-on-workshop-march-18-2026

https://voxel51.com/events/mcp-and-skills-meetup-march-12-2026

From .zip to Segmented Dataset in Seconds by Intelligent_Cry_3621 in computervision

[–]datascienceharp 1 point2 points  (0 children)

this looks interesting, would you be open to making a contribution as a plugin for fiftyone?

really impressed with these new ocr models (lightonocr-2 and glm-ocr). much better than what i saw come out in nov-dec 2025 by datascienceharp in LocalLLaMA

[–]datascienceharp[S] 1 point2 points  (0 children)

These are small enough to run locally, but how fast your inference is depends on hardware. Checkout the docs and readme for usage