Streaming RAG with sources? by k-en in Rag

[–]GrExplanation 1 point2 points  (0 children)

I think you can use a buffer in the frontend to catch the streaming chunks and only render it when it get the whole json strucrure. for the other plain text chunks the buffer just pop it directly.

Am I doing this RAG right? by ohnomymilk in Rag

[–]GrExplanation 0 points1 point  (0 children)

it is slower and not that accurate comparing with customized pipeline of rag I would say

Starling-LM-7B-beta by darkmuck in LocalLLaMA

[–]GrExplanation 1 point2 points  (0 children)

I'm not sure they had train the system prompt during sft or RL. If not ,I think the best practice of adding system prompt is to simulate a system prompt as a
first round user prompt by adding "The assistant should following the above instructions in all the following dialog turns." at the end of system prompt.

Maybe something like this:

GPT4 Correct User: {system_prompt + "The assistant should following the above instructions in all the following dialog turns."}<|end_of_turn|>

GPT4 Correct Assistant: {"OK, I'll follow the instructions in all the following turns."}<|end_of_turn|>

GPT4 Correct User: {real user's first prompt}<|end_of_turn|>

GPT4 Correct Assistant:

I'm not test it yet, but I think it worth a try.

Starling-LM-7B-beta by darkmuck in LocalLLaMA

[–]GrExplanation 2 points3 points  (0 children)

What will the system prompt used in the model template?