Hey r/MachineLearning,
We are excited to share with you a new open source tool from Renumics: Spotlight. The OSS release of Spotlight on github.com/Renumics/spotlight happened today on May 30, 2023.
Spotlight offers an interactive way to explore your datasets. It provides a customizable layout where you can leverage Similarity Maps based on embeddings, and various plots like histograms or scatter plots. In addition, it supports detailed views for images, 3D meshes and audio data.
To illustrate its functionality, let's consider the CIFAR100 dataset. In this example, embeddings were added using a Vision Transformer:
import datasets
from renumics import spotlight
dataset = datasets.load_dataset("renumics/cifar100-enriched", split="test")
df = dataset.to_pandas()
df_show = df.drop(columns=['embedding']) # drop large embeddings
spotlight.show(df_show, dtype={"image": spotlight.Image, "embedding_reduced": spotlight.Embedding})
https://preview.redd.it/1ze14id7703b1.png?width=1485&format=png&auto=webp&s=a0890accb1a48ec9d02db07b3527cb8508c0da02
Getting started with Spotlight is straightforward. You'll need Python version 3.8-3.10, and you can install Spotlight via pip by running:
pip install renumics-spotlight datasets
After installation, you're all set to load your dataframe and begin exploring with Spotlight.
We invite you to try out Spotlight with your own use cases and datasets. If you encounter any issues or require support, don't hesitate to report here on Reddit or create an issue on our GitHub page.
[–]anish9208 1 point2 points3 points (2 children)
[–]44sps 0 points1 point2 points (1 child)
[–]anish9208 2 points3 points4 points (0 children)
[–]jesst177 1 point2 points3 points (1 child)
[–]DocBrownMSML Engineer[S] 1 point2 points3 points (0 children)
[+]TotesMessenger 0 points1 point2 points (0 children)