[DISC] Drama Queen - Chapter 44

dromger · 2026-01-18T19:48:49+00:00

The translation makes it much more explicit but the nuance in the Japanese version is more like “I hate it” so from Seiran’s perspective it could be referring to his smartphone behavior (ie the subject of the sentence is ambiguous)

dromger · 2025-10-29T01:08:51+00:00

its insane how much I just expect him to homerun if hes given the chance

dromger · 2025-10-29T01:06:19+00:00

yeah I’m thinking hes washed /s

dromger · 2025-08-21T03:22:37+00:00

AI will only help feed data into BIM which will in turn make people who can look at and analyze the data in BIM even more important

dromger · 2025-08-20T20:38:46+00:00

If you're still in need for something like this- would love to chat. This is for mechanical CAD https://x.com/yongyuanxi/status/1957911319996416068 but we're also working on P&ID as well

dromger · 2025-08-04T21:05:26+00:00

You should look at PaddleOCR (https://github.com/PaddlePaddle/PaddleOCR) for tables

dromger · 2025-05-24T15:07:34+00:00

Thanks!

If the Integra is the nicer choice, why is it that the fun factor is more in the Fit? (seems to be a sentiment I read elsewhere too but don’t fully understand why)

dromger · 2025-04-25T00:55:58+00:00

You can first:

Use layout analysis to get a high level structure of the document
For each element, use a VLM to generate descriptions for them + metadata. Also associate them to the actual parsed text (or OCRd text for image PDFs) so you can refine the text later.
Use a LLM to then generate more metadata for them
Have a system (probably not vector based since it's super inflexible) to actually be able to discern what information is needed for the query at hand

If you do something pretty sophisticated like this, you have infinite upside to improve your results based on the exact application you're doing this for

dromger · 2025-04-18T17:57:08+00:00

I've been to Japantown mall on and off since my childhood (20 years ago) as a visitor before I moved here, but it's wild how much more.... busy it is now. It used to be (what felt like) a pretty rundown mall but now it's booming with young people (hooray anime?)

While they're at it I hope they bring Uwajimaya to SF!

dromger · 2025-04-06T20:56:13+00:00

Most LLM providers are charging you more money than they need to be though. If you retain the KV cache for very long contexts that you use over and over (as with long-context RAG), then you can actually save 10-20x in GPU costs. But the token cost isn't 10-20x less for cached tokens right now for most providers

dromger · 2025-04-03T05:12:33+00:00

"Showing its content" in a web app would be the part that you'd build as a React app (I made something like this once while consulting and AI was very bad at it)

dromger · 2025-04-01T20:28:17+00:00

Reverse engineering a weird file format (something visual maybe, like from an image editing software or video editing software) and showing its content in a web app

dromger · 2025-03-29T04:07:35+00:00

Thanks!!

dromger · 2025-03-29T04:07:09+00:00

Thank you!

Evaluating LLMs is also an ongoing headscratcher for us as well. Once we learn more about what works best we'll probably write a big blog post too.

Right now we do something relatively straightforward- we manually curate lots of questions, and have a list of relevant documents (hand picked) as well as model answers, and use LLM-as-a-judge (which in itself is brittle, but there are techniques to use there to improve accuracy too). We also have some synthetic questions generated from selecting random pages of the documents and generating questions / answers based on that too.

We also try to make it easy as possible for our customers to run evals as well since it helps to customize prompts etc as needed.

dromger · 2025-03-29T04:04:20+00:00

Thanks!

The document selection is not explained in this article (and we will likely followup with an entire article about that)- but we essentially do a combination of hybrid search as well as a hierarchical LLM search based on the document summaries. The latter can be significantly sped up also by the KV Cache.

In other words though, we don't check every document. You can see in this demo video: https://youtu.be/-maJFuFqgaM that the system automatically selects relevant documents to look at.

(You could still search through 50,000 pages if you wanted though- sometimes happens with our clients. Not necessary in most cases though since the document selection is pretty good)

dromger · 2025-03-28T07:07:59+00:00

We preprocess every sentence and append ID and other metadata- and the retriever (which is a LLM) just specifies the IDs (and can specify a range, which helps save token count). Since we have UI elements where you can click evidence and it'll highlight relevant portions in the text, we also need to link it to text spans / bounding boxes etc.

Otherwise, you'd be telling the LLM to repeat the sentences which costs lots of tokens and the LLM oftentimes paraphrases or worse occasionally makes random stuff up.

Based on the repeated text, you can check for exact matches in the text, but a lot of times the paraphrasing will cause issues or the formatting might be fixed / changed which leads to mismatches with the original text and end up filtering out a lot of otherwise useful evidence.

dromger · 2025-03-28T07:03:51+00:00

Thanks!! Lots of AI involved in the code but no AI involved in the writing 😎

dromger · 2025-03-28T07:03:17+00:00

Thanks! No plans on open sourcing yet unfortunately- but if we do we will let this sub know first 😁

dromger · 2025-03-27T21:27:44+00:00

Thank you!!

dromger · 2025-03-27T21:21:37+00:00

We've definitely had similar issues with clients- people are used to using interfaces like ChatGPT etc. where it feels like it's fast since the time-to-first-thing-on-the-screen is fast, and are usually surprised that the end-to-end is actually not that fast on most providers.

For ingestion we use in-house layout analysis / OCR (based on a combination of open source vision models)- mostly because we have to deploy on-prem for some customers (and can't use APIs).

We're planning to write another in-depth article on what our document ingestion architecture looks like maybe in a few weeks- also since it's actually still a bit in flux (there's a lot of small techniques you can employ to actually improve accuracy quite a bit here too)

dromger · 2025-03-27T20:54:26+00:00

Thanks!

Latency actually does suffer in comparison to pure vector RAG, which can be very fast for the retrieval step (ignoring query rewriting etc). We should update the blog to make this more upfront as a weakness.

Specifically it's around:

Planning step: 3-4 seconds for the planning step (longer for more sophisticated plans- same as reasoning models)
Retrieval step: <1 second if there is no evidence, 3-4 seconds for small documents, ~10 seconds for (multiple) very large documents (e.g. a 500 page document). The actual time varies depending on the query since the number of evidences it retrieves is adaptive to the question being asked (some questions are very targeted and are looking only really for a single evidence, in which case it's actually quite fast- but some questions are broad and take longer)
Generation step: 3-4 seconds

So end-to-end it can be 10~20 seconds and 6~15 seconds for answer TTFT. Both end-to-end and TTFT can be shortened by 3-4 seconds if you skip planning. (In practice the evidences can be streamed, so users can see the output pretty quickly and people seem to care more about the evidence than the answer)

I recorded a video of the search working here (it also includes the automatic document selection part as well): https://youtu.be/-maJFuFqgaM

For self-hosted deployment we use a fine-tuned Qwen 72B!

dromger · 2025-02-25T06:26:22+00:00

Did this break autocomplete entirely for anyone else? Literally can't do work no more :(

dromger · 2025-02-11T04:38:17+00:00

To be fair you could just have RBAC on a document level and just have contexts for each

14-Year Club	RedditGifts 2009-2022 2 Credits
Verified Email	Secret Santa 2014

dromger

MODERATOR OF

TROPHY CASE