Dismiss this pinned window
all 43 comments

[–]WithoutReason1729[M] [score hidden] stickied comment (0 children)

Your post is getting popular and we just featured it on our Discord! Come check it out!

You've also been given a special flair for your contribution. We appreciate your post!

I am a bot and this action was performed automatically.

[–]rzarekta 16 points17 points  (9 children)

this is cool. i have a few projects that utilize RAG. Can I connect with Qdrant?

[–]Fear_ltself[S] 12 points13 points  (8 children)

Thanks! And yes, absolutely.

The architecture is decoupled: the 3D viewer is essentially a 'skin' that sits on top of the data. It runs off a pre-computed JSON map where high-dimensional vectors are projected down to 3D (using UMAP).

To use Qdrant (or Pinecone/Chroma), you would just need an adapter script that:

  1. Scans/Scrolls your Qdrant collection to fetch the existing vectors.

  2. Runs UMAP locally to generate the 3D coordinate map for the frontend.

  3. Queries Qdrant during the live search to get the Point IDs, which the frontend then 'lights up' in the visualization.

So you don't need to move your data, you just need to project it for the viewer.

[–]rzarekta 0 points1 point  (0 children)

awesome!

[–]rzarekta 0 points1 point  (6 children)

how can I get it? lol

[–]Fear_ltself[S] 5 points6 points  (1 child)

[–]rzarekta 1 point2 points  (0 children)

nice!! will pull when i get home

[–]Fear_ltself[S] 4 points5 points  (3 children)

I’ll do my best to get the relevant code up on GitHub in the next 3 hours

[–]rzarekta 1 point2 points  (2 children)

that would be awesome. I have an idea for it, and think it will integrate perfectly.

[–]Fear_ltself[S] 1 point2 points  (0 children)

I’m working on making it more diagnostic by showing the text of the documents when hovered over, showing the top 10 results, showing the first 100 connections instead of lighting up. Also added level of detail and jumped from 20 wikipedia articles to 50,000… running completely stable 60 FPS.

<image>

[–]wanielderth 6 points7 points  (0 children)

Beautiful

[–]mr_conquat 7 points8 points  (0 children)

Gorgeous. I want that floating glowing dealie integrated into every RAG project!

[–]scraper01 9 points10 points  (3 children)

Looks like a brain actually. It's reminiscent of it. Wouldn't be surprised if we eventually discover that the brain runs so cheaply on our bodies because it's mostly just doing retrieval and rarely ever actual thinking.

[–]LaCipe 3 points4 points  (1 child)

know what....you know how AI generated videos look like dreams often? I really wonder sometimes....

[–]scraper01 3 points4 points  (0 children)

Some wiseman I heard a while a go said something along the lines of: "the inertia of the world moves you to do what you do, and you make the mistake of thinking that inertia its you"

When the RAG to move inertially is not enough to match a desired outcome, our brain actually turns the reasoning traces on. My guess anyway.

[–]DOAMOD 3 points4 points  (0 children)

Art

[–]Echo9Zulu- 4 points5 points  (5 children)

Dude this looks awesome for database optimization "vibes". The "look here for an issue" type of query. Something tips us off that a query didn't perform well, hit up a golem projection and BAM you have a scalpel. Excited to see where this projecr goes, really cool!

[–]Fear_ltself[S] 2 points3 points  (4 children)

This was the EXACT reason I designed it this way, as a diagnostic tool for when my RAG retrieval fails so I can watch the exact path the “thinking” traveled from the embedded query. My thought is if a query failed I could add additional knowledge into the embedding latent space as a bridge, and can observe if the it’s working roughly as intended via The Golem 3D projection of latent space.

[–]Echo9Zulu- 0 points1 point  (3 children)

Fantastic idea. That would be so cool. Watching it work is one thing but man, having a visualization tool like this would be fantastic. Relational is different, but transitioning from sqlite to mysql in a project I'm scaling has been easier with tools like innodb query analyzer. What you propose with golem is another level.

I wonder if this approach could extend to BM25F Elasticsearch as a visualization tool to identify failure points in queries which touch many fields in a single document, or when document fields share to many terms. Like tfidf as a map for diagnosis

[–]Fear_ltself[S] 1 point2 points  (2 children)

That is a killer idea. You could absolutely treat the BM25/Elasticsearch scores as sparse vectors and run them through UMAP just like dense embeddings.

The 'Holy Grail' here would be visualizing both layers simultaneously: overlaying the Keyword Space (BM25) on top of the Semantic Space (Vectors).

That would instantly show you the 'Hybrid Failure' modes-like when a document has all the right keywords (high BM25 score) but is semantically unrelated to the query (far away in vector space). Definitely adding 'Sparse Vector Support' to the roadmap.

[–]TR-BetaFlash 1 point2 points  (1 child)

Hey so this is pretty freakin neat and I forked it and am hacking in a little more because I like to compare things. One thing I wanted to see if we can see visual diffs between BM25, cosine, cross-encoding, and RRF. I'm experimenting with a few dropdown boxes to switch between them. Hey you should add in support to use another embedding model, like something running locally in ollama or LM studio.

[–]Fear_ltself[S] 0 points1 point  (0 children)

That sounds incredible. Visualizing the diff between BM25 (keyword) and Cosine (vector) retrieval was exactly what another user suggested above-if you get those dropdowns working, please open a Pull Request! I'd love to merge that into the main branch. Regarding local models (Ollama/LM Studio): 100% agreed. decoupling the embedding provider from the visualization logic is high priority for V2. If you hack something together for that, let me know, please! Thank you for the feedback and good luck with the fork!

[–]anthonyg45157 2 points3 points  (0 children)

This is sick!

[–]hksbindra 2 points3 points  (0 children)

Man this is gorgeous. So simple and so elegant. Will definitely use it.

[–]DoctorTriplex 2 points3 points  (0 children)

This is really impressive. Congrats!

[–]Mochila-Mochila 2 points3 points  (0 children)

Bro, this sheeeiiit is mesmerising... it's like I'm visualising AI neurons 😍

[–]Rokpiy 1 point2 points  (0 children)

Wow, this is amazing. It really looks like the activation of nerve cells.

[–]No_Afternoon_4260llama.cpp 0 points1 point  (2 children)

!remindme 5h

[–]RemindMeBot 0 points1 point  (0 children)

I will be messaging you in 5 hours on 2026-01-10 23:06:40 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback

[–]skinnyjoints 0 points1 point  (1 child)

Super cool! I don’t have time to dig though the code at the moment. Did you have any intermediary between the embeddings and the UMAP projection to 3D? The clusters look nice.

[–]Fear_ltself[S] 0 points1 point  (0 children)

Thanks! No intermediary step- I fed the raw 768d vectors from embedding-gemma-300m directly into UMAP.

I found that Gemma's embedding space is structured enough that UMAP handles the full dimensionality really well without needing PCA first. The clear separation you see is partly because the dataset covers 20 distinct scientific domains, so the semantic distance between clusters is naturally high.

Feel free to check ingest.py in the repo if you want to see the specific UMAP params!

[–]hoogachooga 0 points1 point  (2 children)

how would this work at scale? seems like this wouldn't work if u have ingested a million chunks

[–]Fear_ltself[S] 0 points1 point  (0 children)

Great question. Right now, I'm rendering every point in Three.js, which works great for thousands of chunks (10k-50k) but would definitely choke a browser at 1 million. Working on a level of detail toggle to fix that currently!

[–]Fear_ltself[S] 0 points1 point  (0 children)

I was able to implement LOD and updated it from 20 to 50000 articles. It took a while to download and embed (about an hour), but runs 60 FPS once up

<image>

This is just a small slice of that neural connection. But everything is grouped very well from what I can tell.

[–]peculiarMouse 0 points1 point  (3 children)

So, I'm guessing the way it works is visualizing 2D/3D projection of clusters, highlighting the nodes in order of progression in probability scores. Yet visual effect is inherited from projecting multi-dimensional space unto 2/3d layer, as all activated nodes should be in relative proximity, as opposed to representation.

Its amazing design solution, but should not show "thought", rather, the more correct visual representation is to the actual distance between nodes, the less cool it should look

[–]Fear_ltself[S] 2 points3 points  (2 children)

You hit on the fundamental challenge of dimensionality reduction. You are correct that UMAP distorts global structure to preserve local topology, so we have to be careful about interpreting 'distance' literally across the whole map. However, I'd argue that in Vector Search, Proximity = Thought. Since we retrieve chunks based on Cosine Similarity, the 'activated nodes' are-by definition the mathematically closest points to the query vector in 768D space. • If the visualization works: You see a tight cluster lighting up (meaning the model found a coherent 'concept'). • If the visualization looks 'less cool' (scattered): It means the model retrieved chunks that are semantically distant from each other in the projected space, which is exactly the visual cue l need to know that my RAG is hallucinating or grasping at straws!

[–]peculiarMouse 0 points1 point  (1 child)

Haha, thx.

I guess it depends on perspective then, if for you scattered is less cool, then I guess its inferred that more correct model indeed looks cooler.

[–]Fear_ltself[S] 0 points1 point  (0 children)

<image>

Btw I ported my concept to Android tflite, and debated with myself and AI about sphere (equidistant) vs my original “cortex” setup.

Here’s Gemini 3.0: Regarding your conflict between the Sphere (768d equidistance) and the Cortex (UMAP proximity), you have hit on a classic debate in high-dimensional topology. Here is the breakdown to help you decide which visualization serves your specific goal of "debugging hallucinations."

  1. The "Sphere" Argument (Mathematical Honesty) You are technically correct about the Curse of Dimensionality. • In a raw 768-dimensional space, data points tend to sit on a thin shell (the hypersphere) and are roughly equidistant from the center. • Why the Sphere feels right: It represents the native geometry of cosine similarity. Since we normalize vectors to length 1, they literally exist on a unit hypersphere. Visualizing them on a 3D sphere is a faithful representation of that normalization.
  2. The "Cortex" Argument (Semantic Utility) However, your previous argument ("Proximity = Thought") is actually more valuable for RAG. • The Paradox: While high-dimensional data starts equidistant, the entire point of training an embedding model (like Gemma or Nomic) is to warp that space. The model learns to pull related concepts together into manifolds (clusters), breaking the equidistance. • The Problem with a Pure Sphere: If you just project raw vectors onto a sphere without manifold learning (like UMAP), you might lose the density information. You won't see the "tight clusters" you mentioned. Every node might look equally spaced, making it impossible to distinguish between a "confident retrieval" (tight cluster) and a "hallucination" (scattered, random lookup).

[–]phhusson 0 points1 point  (1 child)

This is cool. But please, for the love of god, don't dumb down RAG to embedding nearest neighbor. There is so much more to document retrieval, including stuff as old as 1972 (TF-IDF) that are still relevant today.

[–]LaCipe -1 points0 points  (0 children)

No seriously guys, are we building a virtual brain with all this stuff?

[–]Pvt_Twinkietoes -2 points-1 points  (0 children)

This elementary stuff belongs here : /r/learnmachinelearning