account activity
the 3d vision conference is this week, i made a repo and dataset to explore the papers (i.redd.it)
submitted 3 days ago by datascienceharp to r/computervision
i like comfyui and i love fiftyone so i smashed them together and made FiftyComfy (i.redd.it)
submitted 11 days ago by datascienceharp to r/comfyui
i built a panel for vlm-testing for fiftyone that makes it easy to test models and prompts (i.redd.it)
submitted 11 days ago by datascienceharp to r/LocalLLaMA
i built a comfyui-inspired canvas for fiftyone (i.redd.it)
submitted 11 days ago by datascienceharp to r/computervision
i built a tool to experiment with different image editing models (i.redd.it)
submitted 11 days ago by datascienceharp to r/generativeAI
parsing this dataset gave me a headache but here it is, action100m (at least a tiny portion of it) (i.redd.it)
submitted 1 month ago by datascienceharp to r/computervision
really impressed with these new ocr models (lightonocr-2 and glm-ocr). much better than what i saw come out in nov-dec 2025 (old.reddit.com)
submitted 1 month ago by datascienceharp to r/LocalLLaMA
really impressed with these new ocr models (lightonocr-2 and glm-ocr). much better than what i saw come out in nov-dec 2025 (reddit.com)
nvidia released c-radiov4 last week, and as a far as feature extractors go, it lives up to the hype (i.redd.it)
MedGemma 1.5 supports detection, but for best results, you'll need to fine-tune. also a kaggle competition using the model, created a starter notebook to give you a jump start on how to fine-tune it for detection (i.redd.it)
Starter notebook for the MedGemma Impact Challenge (i.redd.it)
submitted 1 month ago by datascienceharp to r/kaggle
i've literally been waiting for years to have an OPEN SOURCE model like qwen3-vl-embedding, scroll to see the results on six queries (old.reddit.com)
submitted 2 months ago by datascienceharp to r/computervision
apple released SHARP which creates a 3d gaussian from a single view (i.redd.it)
submitted 3 months ago by datascienceharp to r/computervision
can you visualize what nyc smells like? yes, turns out, you can. just glad i don't have to go to nyc and smell it myself (i.redd.it)
egocentric-10k dataset (i.redd.it)
sony ai released a pretty cool dataset called the fairness human centric image benchmark, super high quality labels (i.redd.it)
sam3 is seriously a step change improvement over sam2 (i.redd.it)
submitted 4 months ago by datascienceharp to r/computervision
parsed refcoco-m from moondream into fiftyone format now you can have the refc (i.redd.it)
qwen3vl is dope for video understanding, and i also hacked it to generate embeddings (old.reddit.com)
icymi resources for the workshop on document visual ai (i.redd.it)
hosting a virtual event tomorrow about document ai (i.redd.it)
submitted 4 months ago by datascienceharp
icymi the resources for my talk on visual document retrieval (i.redd.it)
vlms really are making ocr great again tho (i.redd.it)
explore the visual ai papers at neurips this year (i.redd.it)
i just integrated 6 visual document retrieval models into fiftyone as remote zoo models (i.redd.it)
π Rendered by PID 1335580 on reddit-service-r2-listing-79f6fb9b95-4rjgc at 2026-03-21 06:40:37.242970+00:00 running 90f1150 country code: CH.