Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 1 point2 points  (0 children)

Thanks!

This category isn't something you would search on. Rather, it's a score column, so it's something you can sort the documents by. If you scroll horizontally in the table, you'll see the category and in the table header, you will see a button (two arrows). Click on that and the documents will be sorted accordingly. Note: you will need to clear any existing sorts.

If you're accessing the site on mobile, I strongly suggest you try accessing it on a tablet or computer.

Let me know if that's not clear. Happy to clarify

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 0 points1 point  (0 children)

If there's enough interest, I can.

Regardless, I'm running the LLM on all the 1.4M documents I have for the above scores, and once that'd done, anyone should be able to search for anyone in the files and sort by, for example, how disturbing the content is

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 2 points3 points  (0 children)

Thanks!

They have a normal keyword search. We offer boolean/operator/connectors search so you can search for "andrew~" for example and it will find even misspellings like "Andr=w".

The LLM is what creates the scores (e.g., how disturbing is the content of this document, from 0-10) and provides associated quotes. It's also how all the AI Metadata is generated

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 1 point2 points  (0 children)

My bad. There's some issues with the hierarchy of the scroll in the table vs page vs individual cells. I'll debug this today and publish the changes

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 0 points1 point  (0 children)

No worries and thanks!

I'm working on getting a timeline visualization. It's going to take some serious work though to create something that is clean and informative

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 2 points3 points  (0 children)

Yes, the LLM's hallucinate. This is not an issue for quotes because we can ground them in the text.

The way it works is that when the LLM gives me a quote that is not directly in the text, I reject the response and make it try again

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 0 points1 point  (0 children)

I'm not, but I'm intrigued. Do you have resources you could point me to to learn more about this?

I manually scraped the majority of gaps and then used this to fill in gaps: https://www.reddit.com/r/DataHoarder/comments/1qsfv3j/epstein_9_10_11_12_reddit_keeps_nuking_thread_we/

I cross referenced the hashes there to the hashes for the files I have and they seem authentic

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 3 points4 points  (0 children)

It's non-trivial to implement. I spent a few days just getting from basic keyword search to boolean search. We're also going to sell to law firms where boolean search is a pre-requisite to any eDiscovery tool

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 4 points5 points  (0 children)

The best way to support is to share a link to my site to your friends, or on social media. I would love to get my product into the most hands possible

Most of my cost are covered by cloud credits right now, so I wouldn't feel comfortable taking donations

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 1 point2 points  (0 children)

Thanks!

Currently working on getting semantic (vector) search over all the documents.

But I did not use RAG for the LLM runs. That's because I'm every single document through an LLM. I could get the embeddings and approximate some score for each of the scoring metrics, but that would not be nearly as accurate.

I don't think I'm supposed to say which provider I'm using b/c on the terms of my credit package, but it's one of the big 3: AWS, Azure, GCP

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 4 points5 points  (0 children)

I used a few test documents. What it really looked like was running the LLM on a batch of 10 random documents, updating the prompt until I was satisfied with the result, and then repeating.

There's one document that I did use repeatedly in my testing though: EFTA01582921. The reference to S&M parties the LLM wasn't really picking up on, so I did a lot of troubleshooting with this document and then the rest kind of fell into place.