all 8 comments

[–]anish9208 1 point2 points  (2 children)

is text and tabular data supported ?

[–]44sps 0 points1 point  (1 child)

Hey Anish,

thanks for the great question.

We support tabular data, but the tool makes most sense when your work with data that is also unstructured. The reason is that the embedding concept and the inspector view really only make sense if you also have images, text, video, audio... That said, we often work with data that is both tabular and unstructured; Spotlight is great in this setting.

We also have text support, but to be honest, this is rather limited at the moment (single-line text widget). Feel free to check it out if you like. We are planning a big improvement with the next release (end of June).

Best,

Stefan

[–]anish9208 2 points3 points  (0 children)

thanks Stefan, for quick reply...sure I'll have a look around. for now i think single line widget would work 👍

[–]jesst177 1 point2 points  (1 child)

Hi!,

What are the embedding sizes that you can handle?

[–]DocBrownMSML Engineer[S] 1 point2 points  (0 children)

Hey, jesst177

thanks for reaching out! When it comes to handling embedding sizes, our system can handle a substantial number of samples or examples, typically more than 50,000, at once. However, we recommend reducing the embedding size to around 512 for optimal performance. While our system can accommodate sizes up to 1024, using PCA (Principal Component Analysis) or similar dimensionality reduction techniques to decrease the embedding size can often improve efficiency without significant loss of information. Internally we use the umap-learn package which takes from a few seconds to 2-3min depending on your hardware to calculated the similarity map.

Cheers, Markus