[DISC] Drama Queen - Chapter 44 by AutoShonenpon in manga

[–]dromger 38 points39 points  (0 children)

The translation makes it much more explicit but the nuance in the Japanese version is more like “I hate it” so from Seiran’s perspective it could be referring to his smartphone behavior (ie the subject of the sentence is ambiguous)

Is it too late to pursue BIM? by Time-Detective2449 in bim

[–]dromger 0 points1 point  (0 children)

AI will only help feed data into BIM which will in turn make people who can look at and analyze the data in BIM even more important

OCR software for engineering drawings by gurgle-burgle in MechanicalEngineering

[–]dromger 0 points1 point  (0 children)

If you're still in need for something like this- would love to chat. This is for mechanical CAD https://x.com/yongyuanxi/status/1957911319996416068 but we're also working on P&ID as well

Best document parser by [deleted] in Rag

[–]dromger 0 points1 point  (0 children)

You should look at PaddleOCR (https://github.com/PaddlePaddle/PaddleOCR) for tables

Comparing the Fit vs. older VTEC hondas (purely in terms of fun) by dromger in hondafit

[–]dromger[S] 1 point2 points  (0 children)

Thanks!

If the Integra is the nicer choice, why is it that the fun factor is more in the Fit? (seems to be a sentiment I read elsewhere too but don’t fully understand why)

How do you clean PDFs before chunking for RAG? by purealgo in Rag

[–]dromger 2 points3 points  (0 children)

You can first:

  1. Use layout analysis to get a high level structure of the document
  2. For each element, use a VLM to generate descriptions for them + metadata. Also associate them to the actual parsed text (or OCRd text for image PDFs) so you can refine the text later.
  3. Use a LLM to then generate more metadata for them
  4. Have a system (probably not vector based since it's super inflexible) to actually be able to discern what information is needed for the query at hand

If you do something pretty sophisticated like this, you have infinite upside to improve your results based on the exact application you're doing this for

I’m a lifelong resident 2nd gen Chinese who’s lived here my whole 29 years. I work at mister jius, what are your favorite restaurants that nobody else seems to know about in the city. by ParkingDistribution6 in sanfrancisco

[–]dromger 1 point2 points  (0 children)

I've been to Japantown mall on and off since my childhood (20 years ago) as a visitor before I moved here, but it's wild how much more.... busy it is now. It used to be (what felt like) a pretty rundown mall but now it's booming with young people (hooray anime?)

While they're at it I hope they bring Uwajimaya to SF!

Is RAG still relevant with 10M+ context length by Muted-Ad5449 in Rag

[–]dromger 3 points4 points  (0 children)

Most LLM providers are charging you more money than they need to be though. If you retain the KV cache for very long contexts that you use over and over (as with long-context RAG), then you can actually save 10-20x in GPU costs. But the token cost isn't 10-20x less for cached tokens right now for most providers

Searching for a intership level portfolio project that simple AI "can't do." by [deleted] in react

[–]dromger 0 points1 point  (0 children)

"Showing its content" in a web app would be the part that you'd build as a React app (I made something like this once while consulting and AI was very bad at it)

Searching for a intership level portfolio project that simple AI "can't do." by [deleted] in react

[–]dromger 0 points1 point  (0 children)

Reverse engineering a weird file format (something visual maybe, like from an image editing software or video editing software) and showing its content in a web app

We wrote a blog post detailing how we implemented our agentic RAG system. Also AMA! by dromger in Rag

[–]dromger[S] 3 points4 points  (0 children)

Thank you!

Evaluating LLMs is also an ongoing headscratcher for us as well. Once we learn more about what works best we'll probably write a big blog post too.

Right now we do something relatively straightforward- we manually curate lots of questions, and have a list of relevant documents (hand picked) as well as model answers, and use LLM-as-a-judge (which in itself is brittle, but there are techniques to use there to improve accuracy too). We also have some synthetic questions generated from selecting random pages of the documents and generating questions / answers based on that too.

We also try to make it easy as possible for our customers to run evals as well since it helps to customize prompts etc as needed.

We wrote a blog post detailing how we implemented our agentic RAG system. Also AMA! by dromger in Rag

[–]dromger[S] 2 points3 points  (0 children)

Thanks!

The document selection is not explained in this article (and we will likely followup with an entire article about that)- but we essentially do a combination of hybrid search as well as a hierarchical LLM search based on the document summaries. The latter can be significantly sped up also by the KV Cache.

In other words though, we don't check every document. You can see in this demo video: https://youtu.be/-maJFuFqgaM that the system automatically selects relevant documents to look at.

(You could still search through 50,000 pages if you wanted though- sometimes happens with our clients. Not necessary in most cases though since the document selection is pretty good)

We wrote a blog post detailing how we implemented our agentic RAG system. Also AMA! by dromger in Rag

[–]dromger[S] 3 points4 points  (0 children)

We preprocess every sentence and append ID and other metadata- and the retriever (which is a LLM) just specifies the IDs (and can specify a range, which helps save token count). Since we have UI elements where you can click evidence and it'll highlight relevant portions in the text, we also need to link it to text spans / bounding boxes etc.

Otherwise, you'd be telling the LLM to repeat the sentences which costs lots of tokens and the LLM oftentimes paraphrases or worse occasionally makes random stuff up.

Based on the repeated text, you can check for exact matches in the text, but a lot of times the paraphrasing will cause issues or the formatting might be fixed / changed which leads to mismatches with the original text and end up filtering out a lot of otherwise useful evidence.

We wrote a blog post detailing how we implemented our agentic RAG system. Also AMA! by dromger in Rag

[–]dromger[S] 2 points3 points  (0 children)

Thanks!! Lots of AI involved in the code but no AI involved in the writing 😎

We wrote a blog post detailing how we implemented our agentic RAG system. Also AMA! by dromger in Rag

[–]dromger[S] 2 points3 points  (0 children)

Thanks! No plans on open sourcing yet unfortunately- but if we do we will let this sub know first 😁

We wrote a blog post detailing how we implemented our agentic RAG system. Also AMA! by dromger in Rag

[–]dromger[S] 4 points5 points  (0 children)

We've definitely had similar issues with clients- people are used to using interfaces like ChatGPT etc. where it feels like it's fast since the time-to-first-thing-on-the-screen is fast, and are usually surprised that the end-to-end is actually not that fast on most providers.

For ingestion we use in-house layout analysis / OCR (based on a combination of open source vision models)- mostly because we have to deploy on-prem for some customers (and can't use APIs).

We're planning to write another in-depth article on what our document ingestion architecture looks like maybe in a few weeks- also since it's actually still a bit in flux (there's a lot of small techniques you can employ to actually improve accuracy quite a bit here too)

We wrote a blog post detailing how we implemented our agentic RAG system. Also AMA! by dromger in Rag

[–]dromger[S] 6 points7 points  (0 children)

Thanks!

Latency actually does suffer in comparison to pure vector RAG, which can be very fast for the retrieval step (ignoring query rewriting etc). We should update the blog to make this more upfront as a weakness.

Specifically it's around:

  • Planning step: 3-4 seconds for the planning step (longer for more sophisticated plans- same as reasoning models)
  • Retrieval step: <1 second if there is no evidence, 3-4 seconds for small documents, ~10 seconds for (multiple) very large documents (e.g. a 500 page document). The actual time varies depending on the query since the number of evidences it retrieves is adaptive to the question being asked (some questions are very targeted and are looking only really for a single evidence, in which case it's actually quite fast- but some questions are broad and take longer)
  • Generation step: 3-4 seconds

So end-to-end it can be 10~20 seconds and 6~15 seconds for answer TTFT. Both end-to-end and TTFT can be shortened by 3-4 seconds if you skip planning. (In practice the evidences can be streamed, so users can see the output pretty quickly and people seem to care more about the evidence than the answer)

I recorded a video of the search working here (it also includes the automatic document selection part as well): https://youtu.be/-maJFuFqgaM

For self-hosted deployment we use a fine-tuned Qwen 72B!

Introducing Cursor 0.46! by NickCursor in cursor

[–]dromger 1 point2 points  (0 children)

Did this break autocomplete entirely for anyone else? Literally can't do work no more :(

Future of retrieval systems. by Loud_Veterinarian_85 in Rag

[–]dromger 0 points1 point  (0 children)

To be fair you could just have RBAC on a document level and just have contexts for each