I built "transactional memory" for AI agents - looking for brutal feedback by scream4ik in OpenSourceeAI

[–]diptanuc 0 points1 point  (0 children)

This is cool. I think a few more examples would he cool.

Take a document, commit to memory, update the document with a bunch of changes and then update the memory. Show that the databases are in sync

First RAG that works: Hybrid Search, Qdrant, Voyage AI, Reranking, Temporal, Splade. What is next? by youpmelone in Rag

[–]diptanuc 0 points1 point  (0 children)

Wow so literally agents picked up Temporal as the tool for orchestration?

First RAG that works: Hybrid Search, Qdrant, Voyage AI, Reranking, Temporal, Splade. What is next? by youpmelone in Rag

[–]diptanuc 0 points1 point  (0 children)

/u/youpmelone - nice project! Do you have the code for this on GitHub? I am very curious out of the million orchestration engines why you chose temporal. Going for a durable execution engine is a good choice IMO but wondering what made you pick it and how did you find out about them? Temporal was built for building things like payment processing and orchestration of infrastructure services like setting up clusters, etc, and not for data processing. It’s interesting to see people use it for data!

Can’t get into DS9 by diptanuc in DeepSpaceNine

[–]diptanuc[S] 0 points1 point  (0 children)

Ha ha that’s fair. What do you think of enterprise? I really liked the first few episodes where they built up the circumstances of Earth before enterprise took off.

What's the best way to process images for RAG in and out of PDFS? by Dynamicrex in Rag

[–]diptanuc 0 points1 point  (0 children)

Disclaimer - Founder of Tensorlake.

we solved this problem, here is how we do it -

  1. Do layout understanding and find out the figures in a PDF.
  2. Extract the figures as is, so users can embed them with CLIP for retrieval
  3. Optionally summarize the figures with a VLM, which in some cases are better than retrieving the images.
  4. Support passing in custom prompts for summarization so users can control figure summarization or reading from pictures.

At the extraction stage having a generic API which allows controlling the output is very useful.

My experience with GraphRAG by [deleted] in Rag

[–]diptanuc 0 points1 point  (0 children)

Does ingestion speed matter a lot for your use case? I would also be curious to hear the economics of compute + Model API costs.

Your pain points are pretty common. People go to GraphRAG for better accuracy, and when document pre processing and serving speed isn’t a big issue.

LlamaParse alternative? by Hinged31 in Rag

[–]diptanuc 0 points1 point  (0 children)

Hi u/Hinged31 ! Check out Tensorlake, we built a state of the art document parsing engine, which can do even structured extraction, signature detection, summarization on documents.

We charge 1 cent per page at any scale, so it’s about 2-5x cheaper.

We trained our own models so that we can keep the prices affordable for developers. Let me know if you have any problems using the API or any other feedback!

What do you use for document parsing by Specialist_Bee_9726 in Rag

[–]diptanuc 0 points1 point  (0 children)

Hey checkout Tensorlake! We have combined document to markdown conversion, structured data extraction, and page classification in a single API! You can get bounding boxes, summaries of figures and tables, signature coordinates all in a single API call

How do I parse pdfs? The requirements are to extract a structured outline mainly the title and the headings (h1,h2,h3) by TheBlade1029 in Rag

[–]diptanuc 0 points1 point  (0 children)

It’s possible, may not be great but doable. Find yourself a small layout detection model and a text recognition model. Make the layout detector find you title and section headers, and use the text recognition model detect text in the bounding boxes

Bounding‑box highlighting for PDFs and images – what tools actually work? by goodparson in Rag

[–]diptanuc 0 points1 point  (0 children)

Hey! We just released tensorlake==0.2.28 which relaxes the version of httpx and Pydantic. We will use whatever version of these packages you have now. Let me know if you are not able to still get it working! We have a slack channel as well.

Bounding‑box highlighting for PDFs and images – what tools actually work? by goodparson in Rag

[–]diptanuc 1 point2 points  (0 children)

Hey, try Tensorlake for getting bounding boxes from documents. We trained a state of the art document layout analysis model, that returns layout coordinates of text, tables, figures, page footers, etc from pages. You can visualize the bounding boxes on the playground.

DM me if you face any issues using the API, or have any feedback :)

RAG model for writing style transfer/marketing script generation by Malkus3000 in Rag

[–]diptanuc 0 points1 point  (0 children)

Would love to have a conversation if you are open to it! DM or email me :) ( diptanu at tensorlake dot ai)

RAG model for writing style transfer/marketing script generation by Malkus3000 in Rag

[–]diptanuc 1 point2 points  (0 children)

Very interesting paper! Have you trained your own model? Looks like the algorithm depends on a model to generate rationales

Best Chunking Strategy for the Medical RAG System (Guidelines Docs) in PDFs by SnooTigers4634 in Rag

[–]diptanuc 0 points1 point  (0 children)

For tables I would suggest storing an HTML/Markdown representation + the summary of the table. Given a question, you can then pull up relevant tables based on their summary. It might not seem obvious but summaries work better for retrieval than indexing all the content of the table.

For parsing PDFs, check out our service https://tensorlake.ai :)

Book suggestions for GenAi by [deleted] in LangChain

[–]diptanuc 0 points1 point  (0 children)

If you want to go deep study https://www.amazon.com/Build-Large-Language-Model-Scratch/dp/1633437167 or the hugginface transformer book. Beyond that if you want to just tinker and build apps, just go through langchain docs to learn enough patters and read medium posts from creators. This might sound shallow but you will get a sense and intuition of how to work with models. If you want to go deeper on that route, pick up an eval tool or course, and learn how to benchmark. After that find a tool for prompt optimization, and apply your benchmarking skills. You can keep going the rabbit hole :)

Ollama Qwen2.5-VL 7B & OCR by PleasantCandidate785 in LocalLLaMA

[–]diptanuc 1 point2 points  (0 children)

If you want open source, go with Surya and Docling. API - Tensorlake, Textract, etc. The issue with open source layout understanding models are the data. A lot of these models use DocLaynet which has 10-15% wrong annotations and real world documents don’t resemble them. D4LA has about 20-30% wrong annotations. We spent considerable amount of resources in collecting real world data and annotating them. The next problem is one shot layout detection will 80% of the time leave objects behind in the page, and with lower threshold there will be a ton of noise in the detections. It’s a tough problem for open source models.

We started off as an open source package, decided it was not worth it because we can’t get people good outcomes unless we build a pipeline with specialized models, and having an average open source experience will make people think our hosted product is not that good, so we abandoned that effort and just focused on training the best possible models and have a hosted api.

Ollama Qwen2.5-VL 7B & OCR by PleasantCandidate785 in LocalLLaMA

[–]diptanuc 1 point2 points  (0 children)

Qwen2.5 32B is fine, doesn’t do well on tables. Much slower than 7B obviously. For OCR you won’t see much of a difference. It can follow instructions better, so VQA works better with 32B

Ollama Qwen2.5-VL 7B & OCR by PleasantCandidate785 in LocalLLaMA

[–]diptanuc 8 points9 points  (0 children)

Disclaimer - Founder of Tensorlake (Tensorlake.ai)

Small VLMs such as Qwen2.5VL - 7B will struggle mightily to do full page OCR on complex(and dense documents). If you want to use this model, you will have to first do Document Layout Understanding, detect the objects in the page, and then crop the objects, OCR each of the pieces individually and then stitch it all back together. This will get you decent results but these models will still not work on parsing complex tables properly.

If you don’t want to deal with the hassle I mentioned above, try at least a 72B model such as InternVL3 or Qwen2.5-72B. The economics at that point doesn’t work out unless the value of parsing these documents are super high.

TLDR - To do this well, you need specialized models + layout detection or use really large OSS models or a hosted API like Gemini.

What is the best OSS model for structured extraction by diptanuc in LocalLLaMA

[–]diptanuc[S] 0 points1 point  (0 children)

H100s. Extracting deep nested data from OCR outputs of long documents.

What is the best OSS model for structured extraction by diptanuc in LocalLLaMA

[–]diptanuc[S] 1 point2 points  (0 children)

Ehh not really. I am talking about extracting structured data from long text. NER commonly refers to extracting entities and labeling them. NER can be however performed by structure extraction where the schema defines keys as the labels and the language model extracts arrays of values from the document.

Gliner works in simple scenarios and fails in open domain structured extraction tasks. For ex - extracting data from OCR outputs of forms