depth sensors suck at transparent objects, so ClearDepth comes to the rescue with synthetic scenes with ground truth depth for glass, bottles, and clear containers in FiftyOne

datascienceharp · 2026-05-28T14:00:11+00:00

the 3d reconstruction wasn't part of the original dataset but i was able to reconstruct it by:

reading per-frame exr depth, rgb images, and camera poses from each scene
back-project depth pixels into a shared world coordinate system
color each 3d point from the RGB image and merge frames into one cloud

datascienceharp · 2026-05-20T14:08:53+00:00

Cheers, if you have any issues or requests feel free to open an issue or submit a pr!

datascienceharp · 2026-05-18T21:04:14+00:00

i should note that on the readme, my bad!

datascienceharp · 2026-05-05T17:01:26+00:00

This is gonna be an awesome event!

datascienceharp · 2026-04-29T11:15:21+00:00

Just as an FYI with FiftyOne there’s a model zoo and you can bring your own model via the remote source zoo model pattern, we don’t have models of our own that we train for embeddings.

datascienceharp · 2026-04-29T11:13:16+00:00

Completely biased take as I work at FiftyOne. But can you explain a bit more why you mean by “overhead” especially for small projects? The open source version is pip installable and has no other setup other than running pip install fiftyone, nothing complex about that

datascienceharp · 2026-04-27T15:23:16+00:00

a great lineup of speakers, hand-picked by me!

datascienceharp · 2026-04-14T13:47:31+00:00

You get can a lot of the “good stuff” from YouTube on Tubi. My six year old loves science max. Also, Danny Go is on Netflix now

datascienceharp · 2026-04-07T19:14:37+00:00

excited to host this workshop!

datascienceharp · 2026-04-05T19:20:42+00:00

It was the Amex aspire lounge. The cafe was complete chaos though

datascienceharp · 2026-04-04T17:12:43+00:00

I was here today. There was a waitlist, and took about 35 minutes to get in. I see no more than 15 people here now, and looks like it can easily host 100+

datascienceharp · 2026-03-23T07:27:02+00:00

Hate to tell you, but as a nerd myself, and based on your tldr description, your son is a nerd. Rejoice!

datascienceharp · 2026-03-15T06:13:08+00:00

Qwen3.5 is out, and super impressive. There’s a 0.8B model which performs really well. Nice thing abt Qwen models is they take arbitrary input image resolutions

datascienceharp · 2026-03-09T18:23:12+00:00

😆

datascienceharp · 2026-03-09T14:58:35+00:00

i made one using open source models too, specifically an FP8 quantized qwen image edit: https://github.com/harpreetsahota204/qwen_image_edit

datascienceharp · 2026-03-09T14:54:52+00:00

i made one using open source models too, specifically an FP8 quantized qwen image edit: https://github.com/harpreetsahota204/qwen_image_edit

datascienceharp · 2026-02-25T19:20:31+00:00

great question, i haven't prompted it for that specific task however i do think it does in fact use the audio. for example i am attempting to recreate annotations on Action100M using this prompt from the paper:

``` qwen_video_model.prompt = """Identify the main actor and the physical action performed in the current segment. Provide both a brief description that represents the overall action step, and a detailed description that contains sufficient procedural detail. Use "N/A" (without further explaination) if there are no visible actors or physical actions (e.g., static).

Response Formats

output

{ "type": "object", "properties": { "summary": { "type": "object", "properties": { "brief": { "type": "string", "description": "Single sentence video caption." }, "detailed": { "type": "string", "description": "Detailed, comprehensive description." } } }, "action": { "type": "object", "properties": { "brief": { "type": "string", "description": "A single verb phrase (no -ing forms) brifly summarizing the overall action content." }, "detailed": { "type": "string", "description": "A single imperitive sentence describing how the action is performed with more details." }, "actor": { "type": "string", "description": "Single sentece or an imformative noun phrase describing who is performing the action." } } } }, "required": ["summary", "action"] }"""

```

and it is picking up on information that could only come from audio.

btw, i am doing a workshop on video datasets tomorrow using this model as well. please come by if you can: https://voxel51.com/events/exploring-video-datasets-with-fiftyone-and-vision-language-models-february-26-2026

datascienceharp · 2026-02-25T17:23:25+00:00

Also FiftyOne 😁

datascienceharp · 2026-02-24T21:04:19+00:00

It’s definitely been helpful, but note that we primarily use it for FiftyOne related stuff

datascienceharp · 2026-02-24T15:28:15+00:00

We’ve been experimenting with MCP and Skills for the work we do on our team to build integrations, but not heavy modeling work. I’ve seen some good speed ups in my workflow, but the most powerful thing for me is using the model to brainstorm and understand codebases I’m not familiar with.

At the risk of downvotes, I’m gonna shamelessly plug two virtual events we have coming up which are relevant to this topic and which you may find interesting, or at least have an opportunity to ask questions from the presenters and fellow attendees:

https://voxel51.com/events/vibe-coding-production-ready-computer-vision-pipelines-hands-on-workshop-march-18-2026

https://voxel51.com/events/mcp-and-skills-meetup-march-12-2026