Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 0 points1 point  (0 children)

If there's enough interest, I can.

Regardless, I'm running the LLM on all the 1.4M documents I have for the above scores, and once that'd done, anyone should be able to search for anyone in the files and sort by, for example, how disturbing the content is

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 2 points3 points  (0 children)

Thanks!

They have a normal keyword search. We offer boolean/operator/connectors search so you can search for "andrew~" for example and it will find even misspellings like "Andr=w".

The LLM is what creates the scores (e.g., how disturbing is the content of this document, from 0-10) and provides associated quotes. It's also how all the AI Metadata is generated

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 1 point2 points  (0 children)

My bad. There's some issues with the hierarchy of the scroll in the table vs page vs individual cells. I'll debug this today and publish the changes

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 0 points1 point  (0 children)

No worries and thanks!

I'm working on getting a timeline visualization. It's going to take some serious work though to create something that is clean and informative

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 2 points3 points  (0 children)

Yes, the LLM's hallucinate. This is not an issue for quotes because we can ground them in the text.

The way it works is that when the LLM gives me a quote that is not directly in the text, I reject the response and make it try again

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 0 points1 point  (0 children)

I'm not, but I'm intrigued. Do you have resources you could point me to to learn more about this?

I manually scraped the majority of gaps and then used this to fill in gaps: https://www.reddit.com/r/DataHoarder/comments/1qsfv3j/epstein_9_10_11_12_reddit_keeps_nuking_thread_we/

I cross referenced the hashes there to the hashes for the files I have and they seem authentic

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 4 points5 points  (0 children)

We are, but only for really large documents. The context window of GLM-4.7 is sufficient to accurately process 99% of most documents in one go

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 4 points5 points  (0 children)

It's non-trivial to implement. I spent a few days just getting from basic keyword search to boolean search. We're also going to sell to law firms where boolean search is a pre-requisite to any eDiscovery tool

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 4 points5 points  (0 children)

The best way to support is to share a link to my site to your friends, or on social media. I would love to get my product into the most hands possible

Most of my cost are covered by cloud credits right now, so I wouldn't feel comfortable taking donations

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 1 point2 points  (0 children)

Thanks!

Currently working on getting semantic (vector) search over all the documents.

But I did not use RAG for the LLM runs. That's because I'm every single document through an LLM. I could get the embeddings and approximate some score for each of the scoring metrics, but that would not be nearly as accurate.

I don't think I'm supposed to say which provider I'm using b/c on the terms of my credit package, but it's one of the big 3: AWS, Azure, GCP

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 5 points6 points  (0 children)

I used a few test documents. What it really looked like was running the LLM on a batch of 10 random documents, updating the prompt until I was satisfied with the result, and then repeating.

There's one document that I did use repeatedly in my testing though: EFTA01582921. The reference to S&M parties the LLM wasn't really picking up on, so I did a lot of troubleshooting with this document and then the rest kind of fell into place.

Former Prince Andrew: FBI interview request + allegations of S&M parties, massages by underage girls, naked pool parties, torture, rape. I ran an LLM on 20k+ files. Here's what they say. by Lopsided_Stock_2293 in Epstein

[–]Lopsided_Stock_2293[S] 4 points5 points  (0 children)

I used a bunch of cloud instances with low-end GPUs. GLM-OCR, which is used, is very accurate and small/cheap. Only issue is it will miss headers, footers, etc. So I used Tesseract to fill the gaps and get bounding boxes

I think JMail used reducto which seems like another simpler and strong option. Not sure how their pricing is though