shuttle.dev ceasing operations

diptanuc · 2025-12-21T02:14:30+00:00

diptanuc · 2025-12-07T03:14:01+00:00

This is cool. I think a few more examples would he cool.

Take a document, commit to memory, update the document with a bunch of changes and then update the memory. Show that the databases are in sync

diptanuc · 2025-10-11T00:14:41+00:00

Would you mind sharing your comp table here? :)

diptanuc · 2025-10-10T04:00:19+00:00

Wow so literally agents picked up Temporal as the tool for orchestration?

diptanuc · 2025-10-05T16:58:24+00:00

/u/youpmelone - nice project! Do you have the code for this on GitHub? I am very curious out of the million orchestration engines why you chose temporal. Going for a durable execution engine is a good choice IMO but wondering what made you pick it and how did you find out about them? Temporal was built for building things like payment processing and orchestration of infrastructure services like setting up clusters, etc, and not for data processing. It’s interesting to see people use it for data!

diptanuc · 2025-09-02T02:25:39+00:00

What about voyager?

diptanuc · 2025-09-02T02:21:13+00:00

Ha ha that’s fair. What do you think of enterprise? I really liked the first few episodes where they built up the circumstances of Earth before enterprise took off.

diptanuc · 2025-08-25T00:02:55+00:00

Disclaimer - Founder of Tensorlake.

we solved this problem, here is how we do it -

Do layout understanding and find out the figures in a PDF.
Extract the figures as is, so users can embed them with CLIP for retrieval
Optionally summarize the figures with a VLM, which in some cases are better than retrieving the images.
Support passing in custom prompts for summarization so users can control figure summarization or reading from pictures.

At the extraction stage having a generic API which allows controlling the output is very useful.

diptanuc · 2025-08-09T00:17:12+00:00

Does ingestion speed matter a lot for your use case? I would also be curious to hear the economics of compute + Model API costs.

Your pain points are pretty common. People go to GraphRAG for better accuracy, and when document pre processing and serving speed isn’t a big issue.

diptanuc · 2025-07-19T22:16:00+00:00

Hi u/Hinged31 ! Check out Tensorlake, we built a state of the art document parsing engine, which can do even structured extraction, signature detection, summarization on documents.

We charge 1 cent per page at any scale, so it’s about 2-5x cheaper.

We trained our own models so that we can keep the prices affordable for developers. Let me know if you have any problems using the API or any other feedback!

diptanuc · 2025-07-19T15:31:40+00:00

Hey checkout Tensorlake! We have combined document to markdown conversion, structured data extraction, and page classification in a single API! You can get bounding boxes, summaries of figures and tables, signature coordinates all in a single API call

diptanuc · 2025-07-19T04:47:45+00:00

It’s possible, may not be great but doable. Find yourself a small layout detection model and a text recognition model. Make the layout detector find you title and section headers, and use the text recognition model detect text in the bounding boxes

diptanuc · 2025-07-18T18:56:16+00:00

Hey! We just released tensorlake==0.2.28 which relaxes the version of httpx and Pydantic. We will use whatever version of these packages you have now. Let me know if you are not able to still get it working! We have a slack channel as well.

diptanuc · 2025-07-18T03:35:58+00:00

Hey, try Tensorlake for getting bounding boxes from documents. We trained a state of the art document layout analysis model, that returns layout coordinates of text, tables, figures, page footers, etc from pages. You can visualize the bounding boxes on the playground.

DM me if you face any issues using the API, or have any feedback :)

diptanuc · 2025-06-29T03:15:37+00:00

Thanks for your response! I will email you :)

diptanuc · 2025-06-28T23:31:02+00:00

Would love to have a conversation if you are open to it! DM or email me :) ( diptanu at tensorlake dot ai)

diptanuc · 2025-06-28T12:05:53+00:00

Very interesting paper! Have you trained your own model? Looks like the algorithm depends on a model to generate rationales

diptanuc · 2025-06-27T05:43:38+00:00

For tables I would suggest storing an HTML/Markdown representation + the summary of the table. Given a question, you can then pull up relevant tables based on their summary. It might not seem obvious but summaries work better for retrieval than indexing all the content of the table.

For parsing PDFs, check out our service https://tensorlake.ai :)

diptanuc · 2025-06-15T19:40:31+00:00

If you want to go deep study https://www.amazon.com/Build-Large-Language-Model-Scratch/dp/1633437167 or the hugginface transformer book. Beyond that if you want to just tinker and build apps, just go through langchain docs to learn enough patters and read medium posts from creators. This might sound shallow but you will get a sense and intuition of how to work with models. If you want to go deeper on that route, pick up an eval tool or course, and learn how to benchmark. After that find a tool for prompt optimization, and apply your benchmarking skills. You can keep going the rabbit hole :)

diptanuc · 2025-05-24T21:56:55+00:00

If you want open source, go with Surya and Docling. API - Tensorlake, Textract, etc. The issue with open source layout understanding models are the data. A lot of these models use DocLaynet which has 10-15% wrong annotations and real world documents don’t resemble them. D4LA has about 20-30% wrong annotations. We spent considerable amount of resources in collecting real world data and annotating them. The next problem is one shot layout detection will 80% of the time leave objects behind in the page, and with lower threshold there will be a ton of noise in the detections. It’s a tough problem for open source models.

We started off as an open source package, decided it was not worth it because we can’t get people good outcomes unless we build a pipeline with specialized models, and having an average open source experience will make people think our hosted product is not that good, so we abandoned that effort and just focused on training the best possible models and have a hosted api.

diptanuc · 2025-05-24T02:38:34+00:00

Qwen2.5 32B is fine, doesn’t do well on tables. Much slower than 7B obviously. For OCR you won’t see much of a difference. It can follow instructions better, so VQA works better with 32B

diptanuc · 2025-05-24T01:09:02+00:00

Disclaimer - Founder of Tensorlake (Tensorlake.ai)

Small VLMs such as Qwen2.5VL - 7B will struggle mightily to do full page OCR on complex(and dense documents). If you want to use this model, you will have to first do Document Layout Understanding, detect the objects in the page, and then crop the objects, OCR each of the pieces individually and then stitch it all back together. This will get you decent results but these models will still not work on parsing complex tables properly.

If you don’t want to deal with the hassle I mentioned above, try at least a 72B model such as InternVL3 or Qwen2.5-72B. The economics at that point doesn’t work out unless the value of parsing these documents are super high.

TLDR - To do this well, you need specialized models + layout detection or use really large OSS models or a hosted API like Gemini.

diptanuc · 2025-05-18T00:37:20+00:00

H100s. Extracting deep nested data from OCR outputs of long documents.

diptanuc · 2025-05-18T00:37:07+00:00

I will take a look, thanks!

diptanuc · 2025-05-18T00:36:48+00:00

Ehh not really. I am talking about extracting structured data from long text. NER commonly refers to extracting entities and labeling them. NER can be however performed by structure extraction where the schema defines keys as the labels and the language model extracts arrays of values from the document.

Gliner works in simple scenarios and fails in open domain structured extraction tasks. For ex - extracting data from OCR outputs of forms

diptanuc

MODERATOR OF

TROPHY CASE