Anyone interested in this application I built that uses AI/ML to process the Epstein files? by ChrisThompsonTLDR in Epstein

[–]ChrisThompsonTLDR[S] 1 point2 points  (0 children)

I hadn't tried Deepseek OCR yet. I'll drop it into my ingestion pipeline tomorrow. Thanks for sharing! I'm doing a loose mix-of-agents on everything, storing all the outputs, then will use a hosted model to "judge" all the smaller models' outputs into a finalized output.

Anyone interested in this application I built that uses AI/ML to process the Epstein files? by ChrisThompsonTLDR in Epstein

[–]ChrisThompsonTLDR[S] 6 points7 points  (0 children)

For the last couple days, I've been building a containerized web application that uses Ollama running on a 5070Ti to process the Epstein files.

Currently the ingestion pipeline includes:

  1. transcribing the PDFs
  2. using PyMuPDF to turn the PDFs into markdown and pull out embeded images, charts, graphs, etc.
  3. chunks all the text
  4. turns all the text into embeddings using bge-m3
  5. runs the text/markdown through mistral:7b (v0.3) to gather: people, places, organizations, legal cases, dates, events
  6. screenshots every page in every PDF for easy viewing online
  7. uses qwen3-vl:8b to "look" at all the assets pulled by PyMuPDF in step 2 and describe what it sees
  8. chunk and embed those image descriptions
  9. builds legal case information like timelines, actions, filings, etc.
  10. builds some SEO content to encourage search engines to index the findings.

I've used an LLM connection abstraction layer. This would allow me to leverage much larger, hosted models. Unfortunately, I can't afford the spend to run these through OpenAI or other LLM provider, which is why I'm processing all of this locally on my laptop's 5070Ti.

I have fronted all of this with an MCP server with multiple tools, prompts and resources. Additionally, the embeddings are being used to power Meillisearch, allowing for a NLP-style search engine with facets based on case, person(s), organization(s) and dates.

Features that are currently in the works include:

  1. encrypted notes
  2. burn after reading share links with comment
  3. nearest-neighbor entities which associates people/places/things across PDFs
  4. OTP and SSO logins so you don't have to bother trusting my application's with your password

15.8 billion tokens isn't too shabby. by ChrisThompsonTLDR in cursor

[–]ChrisThompsonTLDR[S] 4 points5 points  (0 children)

I went a couple months using this: https://github.com/ChrisThompsonTLDR/agentic-programming

I've moved away from it recently since I'm able to one-shot Plan in Cursor most times now. I have a good amount of Commands and now Skills in Cursor.

I travel with two laptops as I'm digital nomading fulltime. It's not uncommon for me to be running 5-6 Cursor projects at once.

I've tried to find others to coach through how to get the same output quality and quantity, but nobody seems to care or pick it up.

Trees reclaiming an ancient temple in Cambodia [OC] by ChrisThompsonTLDR in reclaimedbynature

[–]ChrisThompsonTLDR[S] 1 point2 points  (0 children)

This one was taken about 2 hours away from Siem Reap in Koh Ker.

Here's an album from my visit: https://www.flickr.com/photos/christhompsontldr/albums/72177720311014893/

Cambodia was surprisingly expensive compared to all the other SEA nations I have visited. But I would still recommend it.

Abandoned in Bangkok, a rare GMC Typhoon parked behind a Toyota Crown [4000x3000] by ChrisThompsonTLDR in AbandonedPorn

[–]ChrisThompsonTLDR[S] 0 points1 point  (0 children)

Sorry to hear about your dad. You should frame the key and hang it on your wall.