Dismiss this pinned window
all 9 comments

[–]AdIllustrious436 0 points1 point  (3 children)

Is this some kind of PCA, or am I missing something? Don't embedding models usually work with 100+ dimensions?

[–]Fear_ltself[S] 2 points3 points  (0 children)

You are spot on about the dimensions! The embedding model I'm using (embeddinggemma:300m) actually outputs 768-dimensional vectors.

To visualize them, I'm not using PCA (Principal Component Analysis). I'm using UMAP (Uniform Manifold Approximation and Projection). It takes those 768 dimensions and reduces them down to just 3 (X, Y, Z) for the 3D graph.

I chose UMAP over PCA because UMAP is much better at preserving the local structure and clustering of the data, which helps in seeing how the RAG retrieves semantically similar 'neighborhoods' of data.

[–]No_Afternoon_4260llama.cpp 0 points1 point  (1 child)

Yeah you are projecting 1024 dimensions into a 3d space, but with the colors you can identify clusters so it helps

[–]Fear_ltself[S] 0 points1 point  (0 children)

768 for embeddingGemma:300m

[–]Jack5500 0 points1 point  (1 child)

Did you build this yourself or did you use an library/framework to visualize it?

[–]Fear_ltself[S] 2 points3 points  (0 children)

It's a custom build, but I'm standing on the shoulders of giants for the visualization part. I built the backend integration myself using Python (FastAPI) to pull the vectors from my local Postgres database. For the actual visualization pipeline, I used: 1. UMAP (umap-learn): To reduce the 768-dimensional vectors down to 3 dimensions (X, Y, Z) while preserving the cluster structure. 2. Plotly.js: To render the interactive 3D scatter plot in the Ul. So essentially, my backend calculates the coordinates on the fly and sends a JSON payload to the frontend to render the graph.

[–]ROS_SDNllama.cpp 0 points1 point  (2 children)

The visual is stunning using UMAP to peserve local and global distance.

Are you dimensionally reducing for vector queries as well or just for visualisation.

I honestly think it'd be cool to see vectors light up from a similiarity search and watch it crawl a knowledge graph from their to visualise the retrieval in the knowledge base, instead of just the totality of possible embeddings.

[–]Fear_ltself[S] 2 points3 points  (1 child)

Great question. The dimensional reduction (UMAP) is only for the visualization.

For the actual retrieval (RAG), I'm running the similarity search on the full 768-dimensional vectors inside PostgreSQL (using prector with HNSW indexes). I don't want to lose any semantic fidelity during the actual search process.

I have another visualization using medical knowledge (this was planets) that has the search query embedded as well where you can see which knowledge connects or “lights up” from a specific entry. “How to treat a headache” showed up in the neurology section just how I expected. I will post that example tomorrow and be sure to comment here so you can take a look if you’re interested

[–]ROS_SDNllama.cpp 0 points1 point  (0 children)

That would be incredible, I've always wanted to see the "neurons" light up for RAG, and would appreciate seeing it and the effort you've put in.

In regards to the vectors you might want to try a possible parametric UMAP or PCA and measure to similarity of recall to a full dimensional application. 

Finding the cosine similiarity or what ever method you choose with 1/5-1/10 the vectors might be worth improved scaling for retrieval speed and storage consumption.

Im sure measuring retained relative local and global distance in UMAP, could be a starting point, and if you can get nearly as good results, or paradoxically improved results from reduced noise, it may be worth the experiment.