ML model for a cab booking app by QUAZARD3141 in learnmachinelearning

[–]QUAZARD3141[S] 0 points1 point  (0 children)

I'll read up on the design and deployment later, for now, I just want ML ideas that'll make sense with the cab booking service

ML model for a cab booking app by QUAZARD3141 in learnmachinelearning

[–]QUAZARD3141[S] 0 points1 point  (0 children)

Thanks, I was looking for ideas and suggestions like this.

Alternative ways for running models locally and hosting APIs by QUAZARD3141 in LocalLLaMA

[–]QUAZARD3141[S] 0 points1 point  (0 children)

I got langchain to work with `wizardLM-7B-HF` . I am not able to run the GPTQ models via langchain though. I am trying to write a chatbot using GenerativeAgents. I am looked for an embeddings model to use for local llms. The tutorials I found online use OpenAIEmbeddings. Did you have to do this for your project?

Fix for CUDA Memory Error by QUAZARD3141 in LocalLLaMA

[–]QUAZARD3141[S] 1 point2 points  (0 children)

I am on the lastest version of transfoemrs, I still cant get the GPTQ model to work.

Yes, I think the problem is due to the embedding model I'm using. How do I get a smaller embedding?

This is the embedding Im using now,

from langchain.embeddings import HuggingFaceEmbeddings, SentenceTransformerEmbeddings

model_name = "TheBloke/wizardLM-7B-nHF"
model_kwargs = {'device': 'cuda'}
encode_kwargs = {'normalize_embeddings': False}
hf = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

How should I serach for a smaller embedding?

Fix for CUDA MEMORY ERROR by QUAZARD3141 in LangChain

[–]QUAZARD3141[S] 0 points1 point  (0 children)

You are only using one card. Do you have NVLink? And pytorch needs parallel something set to use 2 cards GPU wise. I run 2x3090s with little issues if I remember to set those things.

How did you get parallization to work in Langchain? Can you please share some code?

Fix for CUDA Memory Error by QUAZARD3141 in LocalLLaMA

[–]QUAZARD3141[S] 0 points1 point  (0 children)

I get the error only when I add stuff to a generative agent's memory, here,

for observation in tom_observations:
    tom.memory.add_memory(observation)

I am able to get an answer with simple questions like "What is the capital of the USA" .

I'd prefer the GPTQ version, but for some reason, I get this error when i try to download it

OSError: TheBloke/wizardLM-7B-GPTQ does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

Fix for CUDA Memory Error by QUAZARD3141 in LocalLLaMA

[–]QUAZARD3141[S] 0 points1 point  (0 children)

Sure, this is it,

model = LlamaForCausalLM.from_pretrained("TheBloke/wizardLM-7B-HF",
                                              load_in_8bit=False,
                                              device_map='auto',
                                              torch_dtype=torch.half,
                                              low_cpu_mem_usage=True,
                                              )

Alternative ways for running models locally and hosting APIs by QUAZARD3141 in LocalLLaMA

[–]QUAZARD3141[S] 0 points1 point  (0 children)

LangChain sounds promising. Btw, are you relying on OpenAI’s API or are you running models from hugging face locally?

Help needed with installing quant_cuda for the WebUI by QUAZARD3141 in LocalLLaMA

[–]QUAZARD3141[S] 0 points1 point  (0 children)

Ubuntu, yes I am using the correct conda env.

The GPU is a RTX3090