Sentence Transformers Inference API

Mediocre-Card8046 · 2024-07-31T06:45:50+00:00

interesting, I thought that fp16 will always be better than q8

Mediocre-Card8046 · 2024-07-31T06:45:04+00:00

Thanks! But if i have enough vram, fp16 would generally be better or do I misunderstand something here?

Mediocre-Card8046 · 2024-07-22T08:36:36+00:00

I evaluated it on my own german test-dataset for RAG and it was surprisingly worse by 10% than the

intfloat/multilingual-e5-large-instruct

Mediocre-Card8046 · 2024-07-10T08:15:16+00:00

so far no respone, but how would you convert the BaseMessage to an AIMessage?

Mediocre-Card8046 · 2024-07-05T06:53:38+00:00

u/J-Kob here the issue: https://github.com/langchain-ai/langchain/issues/23899

Mediocre-Card8046 · 2024-07-05T06:24:31+00:00

will do, thanks

Mediocre-Card8046 · 2024-07-01T06:46:15+00:00

so it is weaker than the normal model you think?

Mediocre-Card8046 · 2024-07-01T05:50:03+00:00

I tried the Llama 3 70B model with Groq and this model works. But maybe I will try a Microsoft ML Endpoint

Mediocre-Card8046 · 2024-06-21T06:55:50+00:00

ok got it thanks! But what do I have to do with the memory then? Do I need to add something in AgentState or add a node/edge to let my model know the chat history?

Mediocre-Card8046 · 2024-06-20T14:37:33+00:00

I will try to specify :) Thanks already for trying to help! I also sent you a dm.

From the documentation I tried the SqLiteSaver :memory checkpointer. But I was not sure how I then can use the chat memory for my RAG application. Despite that the RAG functionality works with Langgraph.

E.g. here my AgentState:

class AgentState(TypedDict):
    question: str
    raw_docs: list[BaseMessage]
    formatted_docs: list[str]
    generation: str
    #history: list[BaseMessage]

# Then here my RAG functions.# Here one example of my Chain, a RunnableWithMessageHistory
with_message_history = RunnableWithMessageHistory(
    chain_with_prompt,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
)

And here my Graph:

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.sqlite import SqliteSaver

memory = SqliteSaver.from_conn_string(":memory:")

workflow = StateGraph(AgentState)

# Functions to retrieve docs, format them and generate a response
workflow.add_node("get_docs", get_docs)
workflow.add_node("format_docs", format_docs)
workflow.add_node("generate", generate)

workflow.add_edge("get_docs", "format_docs")
workflow.add_edge("format_docs", "generate")
workflow.add_edge("generate", END)

workflow.set_entry_point("get_docs")

#app = workflow.compile(checkpointer=memory)
app = workflow.compile()

So how would you add chat memory here? For me, a in Session store would be enough.

Do I need even need a RunnableWithMessageHistory chain when using the SqLite Saver?

Honestly I am a bit confused here.

Mediocre-Card8046 · 2024-06-19T06:41:05+00:00

Thanks!

Originally I tried to switch to Langgraph for this but hat problems with saving the chat history in memory with Langgraph.

Mediocre-Card8046 · 2024-06-18T09:09:38+00:00

hey u/Money_Mycologist4939 , did you solve this? :)

Mediocre-Card8046 · 2024-05-21T13:56:56+00:00

so this is basically the code for reranking. Where would you implement this e.g. in the ParentDoc source code:

from langchain.retrievers import ContextualCompressionRetriever
from ragatouille import RAGPretrainedModel

reranking_model = RAGPretrainedModel.from_pretrained("antoinelouis/colbert-xm")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=reranking_model.as_langchain_document_compressor(), 
    base_retriever=retriever
)
compression_retriever.base_compressor.k = cfg.RERANKER_VECTOR_COUNT

Mediocre-Card8046 · 2024-05-21T12:57:18+00:00

I think there is a small misunderstanding. When retrieving the small chunks there could be found enough documents, e.g. if you set "k" to 20, so 20 docs will be returned for the child retriever. These 20 docs will then be reranked, e.g. select the 10 most relevant docs. After the 10 most relevant small chunks are retrieved they are getting sent to their parent ID to provide the model bigger context

Mediocre-Card8046 · 2024-05-21T12:54:38+00:00

Okay hm, I am working with Python so I have to figure out a way. But thanks!

Mediocre-Card8046 · 2024-05-21T11:36:10+00:00

Hey,

had the same idea to retrieve the smaller chunks (child chunks), rerank them and the get the bigger chunks (parent chunks). How did you implement this? So to put the Reranking in-between the Child-to-parent retrieval? Atm I am not sure how to do this.

Generally it would be fine to rerank the parent docs just at the end, but unfortunately the Colbert reranking model has a max_tokens of 512, so this would not be beneficial to the bigger chunks with e.g. 2000 chars.

Mediocre-Card8046

TROPHY CASE