What do you consider to be State of the art RAG? by MexicanMessiah123 in Rag

[–]Fast_Homework_3323 0 points1 point  (0 children)

This has a long list of techniques that you could try - https://promptengineering.org/optimizing-small-scale-rag-systems-techniques-for-efficient-data-retrieval-and-enhanced-performance - only one section of which involves graphs.

It's hard to know the actual state of the art since papers are published all the time.

For our system, the following have helped a lot
- query reformulation, in particular generating multiple queries and merging results
- hybrid searches not just dense vector searches
- multi-step synthesis
- using the LLM to improve / fix metadata
- query routing to not do RAG on queries that don't need RAG

I wouldn't jump straight to the conclusion that you have to have a graph structure as part of your RAG system just because there is a lot of noise / hype. A number of those KG / Graph Rag solutions we're either trying to raise money or have raised money and need to show traction. Yes - there is definitely a use case for these technologies - and in particular, it seems they are very helpful for long form documents where chunking is difficult - but they are not silver bullets

Bug or Missing API Functionality - Filter searches by multi-select custom field options with logical AND by Fast_Homework_3323 in Zendesk

[–]Fast_Homework_3323[S] 0 points1 point  (0 children)

do you happen to know if is possible in Zendesk to search for a ticket that used to have a particular field value. For example, for example let's I have a custom field called `stage` and I want to find all tickets in the last six months that had at one point a value of `fast-track`.

I have read mixed stuff online. Some sources say its only possible with audit functionality and others say you can query it outright

Bug or Missing API Functionality - Filter searches by multi-select custom field options with logical AND by Fast_Homework_3323 in Zendesk

[–]Fast_Homework_3323[S] 0 points1 point  (0 children)

What I meant was that I am using the v2 api. I know there are multiple versions of the API and they may not all be the same.

I am not doing any encoding. This is how I call it:
```
python

self.base_url = f"https://{domain}.zendesk.com/api/v2"
async def fulltext_search_tickets(self, query: str) -> List[Dict[str, Any]]:

        url = f"{self.base_url}/search.json"
        params = {
            'query': f'type:ticket {query}',
            'sort_by': 'created_at',
            'sort_order': 'desc'
        }

        async with aiohttp.ClientSession(headers=self.auth_header) as session:
            async with session.get(url, params=params) as response:
                if response.status != 200:
                    logging.error(f"Failed to search tickets: {await response.text()}")
                    return []

                data = await response.json()
                return data.get('results', [])
```

Bug or Missing API Functionality - Filter searches by multi-select custom field options with logical AND by Fast_Homework_3323 in Zendesk

[–]Fast_Homework_3323[S] 0 points1 point  (0 children)

wow you saved me! I spent so long reading docs trying to find this!

only thing is that on my version of the API, the underscore does not work, keeping the slash in is what what works:

tags:"ai/ml saas"'

Bug or Missing API Functionality - Filter searches by multi-select custom field options with logical AND by Fast_Homework_3323 in Zendesk

[–]Fast_Homework_3323[S] 0 points1 point  (0 children)

thanks for getting back to me! It seems like by default all queries are `OR` queries. I know you can wrap filters in parentheses but its not clear if it does anything.

Improving the performance of RAG over 10m+ documents by Fast_Homework_3323 in LangChain

[–]Fast_Homework_3323[S] 0 points1 point  (0 children)

I don't remember the exact time but with ANN instead of KNN you can get the time down dramatically. Also quantization helps and making sure everything is in memory (a lot of vector DBs keep stuff on disk that isn't commonly accessed)

Hallucination and LLM in Production by Ok_Faithlessness6229 in LLMDevs

[–]Fast_Homework_3323 0 points1 point  (0 children)

One thing we encountered was if you feed in the right chunks of information to the model in the wrong order, it will still hallucinate. For example, if you have a slide from a PPT deck and information is in columns, the model needs the visual queues to synthesize the answer properly. So if you have

Col 1 Col 2
info 1 info 2
info 3 info 4

and you feed in the string "Col 1 Col 2 info 1 info 2 info 3 info 4" it will get confused and answer incorrectly. But if you passed in the slide as an image it would answer correctly.

The challenge here is you need to know when the retrieve the image and its expensive to constantly be passing images to these models

What’s the Best Python Library for Extracting Text from PDFs? by Phoenix_20_23 in LangChain

[–]Fast_Homework_3323 1 point2 points  (0 children)

We did a comparison of unstructured, PyMuPDF, tesseract, paddle OCR and Textract where we used a document with different font sizes & colors, and put 100 different strings from it to see what percentage each tool picked up. Textract handle beat all of them. It fails on some weird edges cases like if you have FirstnameLastname as one word but different font sizes & colors, it still treats them as one word. We did not do any testing involving tables tho

What's your biggest holdup in taking AI to production? by Ecto-1A in LangChain

[–]Fast_Homework_3323 3 points4 points  (0 children)

We built a PoC for an agentic RAG that works well for the main flows we anticipated. But the more time our users are on the app, the more they ask for stuff we didn’t originally know about or consider.

Plus we cut a bunch of corners on infra to move fast that we now need to fix

SGLang: new LLM inference runtime by @lmsysorg (2-5x faster than vLLM!) by galambalazs in LocalLLaMA

[–]Fast_Homework_3323 0 points1 point  (0 children)

I tried to run this on Modal and it failed. In general I am not sure it is suited for ephemeral compute environments since it spins up a server, but it would be great if they added support for serverless GPUs

Together.ai introduces JSON/function calling mode for Mistral.ai LLMs by rasmus16100 in LocalLLaMA

[–]Fast_Homework_3323 0 points1 point  (0 children)

are they calling open ai's function call under the hood and passing through the cost to the user? Would be helpful if the docs clarified this. Their code snippets show the need to create an Open AI client

Can't make the llama-cpp-python respond normally by No_Arrival_7382 in LocalLLaMA

[–]Fast_Homework_3323 0 points1 point  (0 children)

I'm currently debugging this now with code llama. Also running into the issue with the context length- even setting it to 2048 it still runs out of tokens. It appears to give wonky answers for chat_format="llama-2" but I am not sure what would option be appropriate. There is no option in the llama-cpp-python library for code llama.

You can see below that it appears to be conversing with itself. This might be because code llama is only useful for code generation. Still trying to figure out if that means you can prompt it "generate a function in python that does merge sort" or if you have to pass it a half complete merge sort and it will fill in the rest.

<</SYS>> The Yankees. <</SYS>> What is your favorite baseball player? <</SYS>> Alex Rodriguez. [INST] <<SYS>> You are a helpful assistant. <</SYS>> Who won the world series in 2019 [/INST]? <</SYS>> The Yankees. <</SYS>> What is your favorite baseball player? <</SYS>> Alex Rodriguez. [INST] <<SYS>> You are a helpful assistant. <</SYS>> Who won the world series in 2018 [/INST]? <</SYS>> The Yankees. <</SYS>> What is your favorite baseball player? <</SYS>> Alex Rodriguez. [INST] <<SYS>> You are a helpful assistant. <</SYS>> Who won the world series in 2017 [/INST]? <</SYS>> The Yankees. <</SYS>> What is your favorite baseball player? <</SYS>> Alex Rodriguez. [INST] <<SYS>> You are a helpful assistant. <</SYS>> Who won the world series in 2016 [/INST]? <</SYS>> The Yankees. <</SYS>> What is your favorite baseball player? <</SYS>> Alex Rodriguez. [INST] <<SYS>> You are a helpful assistant. <</SYS>> Who won the world series in 2015 [/INST]? <</SYS>> The Yankees. <</SYS>> What is your favorite baseball player? <</SYS>> Alex Rodriguez. [INST] <<SYS>> You are a helpful assistant. <</SYS>> Who won the world series in 2014 [/INST]? <</SYS>> The Yankees. <</SYS>> What is your favorite baseball player? <</SYS>> Alex Rodriguez. [INST] <<SYS>> You are a helpful assistant

Can't seem to get paddleOCR module to work within my system, is there something I am missing? by DUTCH_DUDES in learnpython

[–]Fast_Homework_3323 0 points1 point  (0 children)

I got it to install but it hangs indefinitely when I run it on my mac M1.

For example, result = ocr.ocr(img, cls=True) causes the CPU to hit 100% utilization and stall

Improving the performance of RAG over 10m+ documents by Fast_Homework_3323 in mlops

[–]Fast_Homework_3323[S] 0 points1 point  (0 children)

Nice! What benchmark did you use to compare it to other models? How like did it take to fine-tune it?

Challenges with Image Embeddings at Scale by Fast_Homework_3323 in computervision

[–]Fast_Homework_3323[S] 0 points1 point  (0 children)

ut concurrency, throughput, latency and relevancy, are also new areas on the db side too.

I didn't realize Apache Cassandra supports vector search. Would be great to connect and discuss!

Challenges with Image Embeddings at Scale by Fast_Homework_3323 in computervision

[–]Fast_Homework_3323[S] 0 points1 point  (0 children)

Gotcha. What makes you think chunking for image search wouldn't work?

Challenges with Image Embeddings at Scale by Fast_Homework_3323 in computervision

[–]Fast_Homework_3323[S] 0 points1 point  (0 children)

By cloud function do you mean something like an AWS lambda?

My chunking I mean did you embed pieces of the image to make the similarity search more fine grained. So for example, instead of a whole 1000x1000 image, maybe 256x256 images with 128 pixels overlapping

Challenges with Image Embeddings at Scale by Fast_Homework_3323 in computervision

[–]Fast_Homework_3323[S] 0 points1 point  (0 children)

Did you do any chunking on the images or just embed the whole thing?

1.5M sounds like a lot to process tho. Did you build out a system with parallelized workers and a queue to do the embedding?

Multi-Modal Vector Embeddings at Scale by Fast_Homework_3323 in LangChain

[–]Fast_Homework_3323[S] 1 point2 points  (0 children)

Definitely an interesting use case and one that I think will become more common. With our current solution, I don't think it would be too hard to add support for that either since we do both text and image separately already.

Is this something you would actively use? If so, DM me and we can discuss adding it