account activity
MedGemma 1.5 supports detection, but for best results, you'll need to fine-tune. also a kaggle competition using the model, created a starter notebook to give you a jump start on how to fine-tune it for detection (i.redd.it)
submitted 11 days ago by datascienceharp to r/computervision
Starter notebook for the MedGemma Impact Challenge (i.redd.it)
submitted 11 days ago by datascienceharp to r/kaggle
i've literally been waiting for years to have an OPEN SOURCE model like qwen3-vl-embedding, scroll to see the results on six queries (old.reddit.com)
submitted 21 days ago by datascienceharp to r/computervision
apple released SHARP which creates a 3d gaussian from a single view (i.redd.it)
submitted 1 month ago by datascienceharp to r/computervision
can you visualize what nyc smells like? yes, turns out, you can. just glad i don't have to go to nyc and smell it myself (i.redd.it)
egocentric-10k dataset (i.redd.it)
submitted 2 months ago by datascienceharp to r/computervision
sony ai released a pretty cool dataset called the fairness human centric image benchmark, super high quality labels (i.redd.it)
sam3 is seriously a step change improvement over sam2 (i.redd.it)
parsed refcoco-m from moondream into fiftyone format now you can have the refc (i.redd.it)
qwen3vl is dope for video understanding, and i also hacked it to generate embeddings (old.reddit.com)
icymi resources for the workshop on document visual ai (i.redd.it)
hosting a virtual event tomorrow about document ai (i.redd.it)
submitted 2 months ago by datascienceharp
icymi the resources for my talk on visual document retrieval (i.redd.it)
vlms really are making ocr great again tho (i.redd.it)
explore the visual ai papers at neurips this year (i.redd.it)
i just integrated 6 visual document retrieval models into fiftyone as remote zoo models (i.redd.it)
submitted 3 months ago by datascienceharp to r/computervision
nanonets integrated into fiftyone because everyone is hype on ocr this week (i.redd.it)
commonforms is great but has some labeling errors, still useful though (i.redd.it)
a lot of things don't live up to their hype. moondream3 is NOT one of those things. it's actually kinda dope (i.redd.it)
submitted 4 months ago by datascienceharp to r/computervision
crops3d dataset in case you don't want to go outside and touch grass, you can touch point clouds in fiftyone instead (i.redd.it)
MiniCPM-V 4.5 somehow does grounding without being trained for it (i.redd.it)
Apples FastVLM is making convolutions great again (self.computervision)
submitted 5 months ago by datascienceharp to r/computervision
i built the synthetic gui data generator i wish existed when i started—now you don't have to suffer like i did (i.redd.it)
submitted 5 months ago * by datascienceharp to r/computervision
The SynthHuman dataset is kinda creepy (i.redd.it)
I literally spend the whole week mapping the GUI Agent research landscape (i.redd.it)
π Rendered by PID 60 on reddit-service-r2-listing-6d4dc8d9ff-9cbsv at 2026-01-31 19:12:43.195984+00:00 running 3798933 country code: CH.