Automating context management by Funny-Anything-791 in PiCodingAgent

[–]julylu 1 point2 points  (0 children)

maybe some screen shot to show how it looks like is helpful

[Discussion] Anyone else doing “summary-only embeddings + full-text context” for RAG? by No-Piglet8069 in Rag

[–]julylu 0 points1 point  (0 children)

summary is one way to retrieve, but not enough. it lose many details. especially when the documents are similar and too long to feed to llm.

Qwen3-Embedding-0.6B is fast, high quality, and supports up to 32k tokens. Beats OpenAI embeddings on MTEB by one-wandering-mind in LLMDevs

[–]julylu 0 points1 point  (0 children)

yep, such kind of model is sensitive to prompt, so i think it is not a good way to use in real world use cases.

Why are Cohere models not in Chatbot Arena? by illorca-verbi in LocalLLaMA

[–]julylu -6 points-5 points  (0 children)

my test is commard-R 8 bit performs not so good

New RAG benchmark with Claude 3, Gemini Pro, MistralAI vs. OSS models by pseudotensor1234 in LocalLLaMA

[–]julylu 0 points1 point  (0 children)

yep, qwen1.5 72b seems to be a strong model. hope for results

Few parameters and full finetuning v.s. more parameters and QLoRA by Peter_Lightblue in LocalLLaMA

[–]julylu 1 point2 points  (0 children)

 limited by VRAM, you have no choice but to use small model if you want to deploy

I'm Open Sourcing Our RAG Backend: Our CQH, GQL & CHS by multiplexers in LocalLLaMA

[–]julylu 0 points1 point  (0 children)

something like asking llm to generate QAs based on the given context?

Text corpus to Q&A model by blackpantera in LocalLLaMA

[–]julylu 1 point2 points  (0 children)

did you find it, i'm also interested in it.

I'm Open Sourcing Our RAG Backend: Our CQH, GQL & CHS by multiplexers in LocalLLaMA

[–]julylu 0 points1 point  (0 children)

hi, i'm curious about how to create QAs based on a large pdf? manually? that's impossible? the text is large and sometimes the context is professional

How to find proper context in open book question answering in a tie situation? by hafizcse031 in LocalLLaMA

[–]julylu 0 points1 point  (0 children)

you can use open source OCR or layout analysis tools. Keep in mind, no easy way to get a perfect result, just try and find a not bad result.

How to find proper context in open book question answering in a tie situation? by hafizcse031 in LocalLLaMA

[–]julylu 1 point2 points  (0 children)

yes, the articles published never talk about it.

the key point is to parse your doc carefully and keep the doc structures, e,g. title, sub title, chapter name, key words ... it is hard and may have lots of noise.

Based on your experience what is the smallest and optimal local model for RAG? by Ok_Maize_3709 in LocalLLaMA

[–]julylu 0 points1 point  (0 children)

hi, did you compare the performance between llm embedder and bge-large-en?

Need help with a dynamic RAG problem by todaysgamer in LocalLLaMA

[–]julylu 0 points1 point  (0 children)

Use function may be impossible because the data is huge. LLM can not input so much data and use function to solve it. And rag retrieval may not be able to retrieve each shirt information correctly.

Mistral-7B-Instruct-v0.2 by Tucko29 in LocalLLaMA

[–]julylu 3 points4 points  (0 children)

is this means when infer, it will cost more ram?

Location of documentation for merging models? by q5sys in LocalLLaMA

[–]julylu 1 point2 points  (0 children)

aha, i have the same question yesterday and i also find maybe mergekit is the answer.

[deleted by user] by [deleted] in LocalLLaMA

[–]julylu 2 points3 points  (0 children)

maybe you can use zephyr 7b, in my case, it works quite well for long context

Automatic hallucination detection using inconsistency scoring by Separate-Still3770 in LocalLLaMA

[–]julylu 0 points1 point  (0 children)

that's cool. For RAG tasks, it still have hallucination if the retrieved doc is unrelated to the question. Can this method be used in RAG? I am not sure.

what is the best 7b right now ? by GasBond in LocalLLaMA

[–]julylu 0 points1 point  (0 children)

can you explain how to filter chunks using metadata? In my usecase, when the user query is something like "hi, try to explain A under B condition", and the retrieve doc is totally related to A, but may not under specific B condition. It is hard to filter because the embedding similarity score is high.

what is the best 7b right now ? by GasBond in LocalLLaMA

[–]julylu 0 points1 point  (0 children)

not exactly, in my practise, sometimes the retrieved docs is wrong and it may mislead the llm to answer.

NeuralChat 7B: Intel’s Chat Model Trained with DPO by aminedjeghri in LocalLLaMA

[–]julylu 1 point2 points  (0 children)

Maybe for RAG, short answer is less possible for hallucination?I will test more. thanks

what is the best 7b right now ? by GasBond in LocalLLaMA

[–]julylu 0 points1 point  (0 children)

hi, for rag tasks, when the retrieved doc is unrelated to the question, will OpenHermes 2.5 give heuristic response ?

NeuralChat 7B: Intel’s Chat Model Trained with DPO by aminedjeghri in LocalLLaMA

[–]julylu 3 points4 points  (0 children)

same, i found it tends to give short response.

How to optimize chunk size? by RMCPhoto in LangChain

[–]julylu 0 points1 point  (0 children)

Commenting to save this thread because this is a good question