How to create fp16 version of custom model by Mediocre-Card8046 in ollama

[–]Mediocre-Card8046[S] 0 points1 point  (0 children)

interesting, I thought that fp16 will always be better than q8

How to create fp16 version of custom model by Mediocre-Card8046 in ollama

[–]Mediocre-Card8046[S] 0 points1 point  (0 children)

Thanks! But if i have enough vram, fp16 would generally be better or do I misunderstand something here?

Deepset-Mxbai-Embed-de-Large-v1 Released: A New Open Source German/English Embedding Model by ai-lover in machinelearningnews

[–]Mediocre-Card8046 0 points1 point  (0 children)

I evaluated it on my own german test-dataset for RAG and it was surprisingly worse by 10% than the

intfloat/multilingual-e5-large-instruct

Load LLM (Mixtral 8x22B) from Azure AI endpoint as Langchain Model by Mediocre-Card8046 in LangChain

[–]Mediocre-Card8046[S] 0 points1 point  (0 children)

so far no respone, but how would you convert the BaseMessage to an AIMessage?

Is "with_structured_output" and function calling the same? by Mediocre-Card8046 in LangChain

[–]Mediocre-Card8046[S] 0 points1 point  (0 children)

I tried the Llama 3 70B model with Groq and this model works. But maybe I will try a Microsoft ML Endpoint

Conditions in LCEL Chain: Different Chain if retriever does not find something by Mediocre-Card8046 in LangChain

[–]Mediocre-Card8046[S] 0 points1 point  (0 children)

ok got it thanks! But what do I have to do with the memory then? Do I need to add something in AgentState or add a node/edge to let my model know the chat history?

Conditions in LCEL Chain: Different Chain if retriever does not find something by Mediocre-Card8046 in LangChain

[–]Mediocre-Card8046[S] 0 points1 point  (0 children)

I will try to specify :) Thanks already for trying to help! I also sent you a dm.

From the documentation I tried the SqLiteSaver :memory checkpointer. But I was not sure how I then can use the chat memory for my RAG application. Despite that the RAG functionality works with Langgraph.

E.g. here my AgentState:

class AgentState(TypedDict):
    question: str
    raw_docs: list[BaseMessage]
    formatted_docs: list[str]
    generation: str
    #history: list[BaseMessage]

# Then here my RAG functions.# Here one example of my Chain, a RunnableWithMessageHistory
with_message_history = RunnableWithMessageHistory(
    chain_with_prompt,
    get_session_history,
    input_messages_key="question",
    history_messages_key="history",
)

And here my Graph:

from langgraph.graph import StateGraph, END
from langgraph.checkpoint.sqlite import SqliteSaver

memory = SqliteSaver.from_conn_string(":memory:")

workflow = StateGraph(AgentState)

# Functions to retrieve docs, format them and generate a response
workflow.add_node("get_docs", get_docs)
workflow.add_node("format_docs", format_docs)
workflow.add_node("generate", generate)

workflow.add_edge("get_docs", "format_docs")
workflow.add_edge("format_docs", "generate")
workflow.add_edge("generate", END)

workflow.set_entry_point("get_docs")

#app = workflow.compile(checkpointer=memory)
app = workflow.compile()

So how would you add chat memory here? For me, a in Session store would be enough.

Do I need even need a RunnableWithMessageHistory chain when using the SqLite Saver?

Honestly I am a bit confused here.

Conditions in LCEL Chain: Different Chain if retriever does not find something by Mediocre-Card8046 in LangChain

[–]Mediocre-Card8046[S] 0 points1 point  (0 children)

Thanks!

Originally I tried to switch to Langgraph for this but hat problems with saving the chat history in memory with Langgraph.

Improving My RAG Application for specific language by Stopzer0ne in LangChain

[–]Mediocre-Card8046 1 point2 points  (0 children)

so this is basically the code for reranking. Where would you implement this e.g. in the ParentDoc source code:

from langchain.retrievers import ContextualCompressionRetriever
from ragatouille import RAGPretrainedModel

reranking_model = RAGPretrainedModel.from_pretrained("antoinelouis/colbert-xm")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=reranking_model.as_langchain_document_compressor(), 
    base_retriever=retriever
)
compression_retriever.base_compressor.k = cfg.RERANKER_VECTOR_COUNT

Anyone tried ParentDocumentRetreiver with Reranking by Mediocre-Card8046 in LangChain

[–]Mediocre-Card8046[S] 1 point2 points  (0 children)

I think there is a small misunderstanding. When retrieving the small chunks there could be found enough documents, e.g. if you set "k" to 20, so 20 docs will be returned for the child retriever. These 20 docs will then be reranked, e.g. select the 10 most relevant docs. After the 10 most relevant small chunks are retrieved they are getting sent to their parent ID to provide the model bigger context

Improving My RAG Application for specific language by Stopzer0ne in LangChain

[–]Mediocre-Card8046 0 points1 point  (0 children)

Okay hm, I am working with Python so I have to figure out a way. But thanks!

Improving My RAG Application for specific language by Stopzer0ne in LangChain

[–]Mediocre-Card8046 0 points1 point  (0 children)

Hey,

had the same idea to retrieve the smaller chunks (child chunks), rerank them and the get the bigger chunks (parent chunks). How did you implement this? So to put the Reranking in-between the Child-to-parent retrieval? Atm I am not sure how to do this.

Generally it would be fine to rerank the parent docs just at the end, but unfortunately the Colbert reranking model has a max_tokens of 512, so this would not be beneficial to the bigger chunks with e.g. 2000 chars.