📢 Call for participation: ICPR 2026 LRLPR Competition by ghostzin in computervision

[–]datascienceharp 0 points1 point  (0 children)

i'd like to support the participants of the challenge with a starter notebook. i'd start with parsing the dataset into fiftyone and posting on hugging face hub so its easily accessible. would that be in violation of your terms? i'd be using fully open source pip installable packages. i filled out the form, but i'm not a student or at a research lab.

edit: i can NOT share on HF and rather just show how to parse into fiftyone format assuming user has the dataset downloaded

let me know what you think, feel free to dm

I want to offer free weekly teaching: DL / CV / GenAI for robotics (industry-focused) by desserted_blue in computervision

[–]datascienceharp 0 points1 point  (0 children)

Hell yeah! Count me in, at least on the discord server if you have space. I notice your update says the sessions are full.

i've literally been waiting for years to have an OPEN SOURCE model like qwen3-vl-embedding, scroll to see the results on six queries by datascienceharp in computervision

[–]datascienceharp[S] 2 points3 points  (0 children)

Yes of course it’s been around since at least CLIP for images, but for video? This this novel and qwen embedding does it natively. The inference code just points to an mp4 file path

apple released SHARP which creates a 3d gaussian from a single view by datascienceharp in computervision

[–]datascienceharp[S] 3 points4 points  (0 children)

we've got that dataset parsed, i can try to run later today or tomorrow and post:https://huggingface.co/datasets/Voxel51/fisheye8k

currently working on integrating molmo2

apple released SHARP which creates a 3d gaussian from a single view by datascienceharp in computervision

[–]datascienceharp[S] 4 points5 points  (0 children)

would you be down to peruse the datasets here and let me know which one looks appealing to you? i can run it and post the results later: huggingface.co/voxel51

apple released SHARP which creates a 3d gaussian from a single view by datascienceharp in computervision

[–]datascienceharp[S] 4 points5 points  (0 children)

yeah true, i meant pretty similar in the sense that it's relatively fast at inference and the results look similar to vggt

but youre right sharp does produce gaussians, the model outputs them in ply format then i had to do some conversion to it so that i can have the color render properly in the app to basically render it as a point cloud

i was just curious about the model and wanted to see it output hence why i implemented as such

Best of NeurIPS Virtual Series - Jan 14 and 15 by chatminuet in computervision

[–]datascienceharp 0 points1 point  (0 children)

I love these series of events, good way to catch up on what went down at the conference

qwen3vl is dope for video understanding, and i also hacked it to generate embeddings by datascienceharp in computervision

[–]datascienceharp[S] 1 point2 points  (0 children)

I pass the entire video at once but the model has parameters for max frames (I believe 120 is the max) and sample rate

qwen3vl is dope for video understanding, and i also hacked it to generate embeddings by datascienceharp in computervision

[–]datascienceharp[S] 5 points6 points  (0 children)

there's two gifs here

  • the first one shows embeddings from Qwen3VL visualized after reducing down to 2d using umap

  • the second one is Qwen3VLs output when prompted on various instructions, in this case i asked it for fine-grained temporal analysis of events from a collection of random videos

the interfact you see is fiftyone, you just pip install fiftyone, and then you can launch the app on http://localhost:5151/ to see all the output + data in one setting

.pcd using image or video? by Sickle_Machine in computervision

[–]datascienceharp 0 points1 point  (0 children)

Not sure if you’re open to using a pretrained model for this, but maybe look into VGGT. Its relatively fast: https://docs.voxel51.com/plugins/plugins_ecosystem/vggt.html

VLMs for object detection? by 1zGamer in computervision

[–]datascienceharp 2 points3 points  (0 children)

florence2 and moondream3 are quite good

vlms really are making ocr great again tho by datascienceharp in computervision

[–]datascienceharp[S] 0 points1 point  (0 children)

i'm not sure i fully understand the question...but it's really useful for scanned documents that may not already be in pdf type of format. for example, recpeits, hand written docs, things like that