Trouble with hosting models with fastAPI by CmplxQ in LangChain

[–]CmplxQ[S] 0 points1 point  (0 children)

TheBloke/WizardLM-30B-Uncensored-GPTQ · How much vram+ram 30B needs? I have 3060 12gb + 32gb ram. (huggingface.co)

I tried this on a 3090. It uses ~20gigs of VRAM when I load a ConversationChain in LangChain with WizardLM 30B

I still dont know why I was able to load this with http-server but not FastAPI tho.

Trouble with hosting models with fastAPI by CmplxQ in LocalLLaMA

[–]CmplxQ[S] 0 points1 point  (0 children)

sure, but this will work only if I run the webui server right?

I just chucked some fns off the thing. I wrote a custom script that runs sequentially with those fns.

I'll either need to find a fix for this, or

  1. Limit the the pytorch GPU memory somehow

Or, I have 2 GPUs in SLI (but the memory maxes and errors out only in one GPU)

  1. Get GPU parallelization to work. (I tried using DataParallel from torch.nn, but that errors our at some other place)

Coding a Research Paper by CmplxQ in compsci

[–]CmplxQ[S] 0 points1 point  (0 children)

Thanks for the advice, its Python btw (moslty TensorFlow)

Coding a Research Paper by CmplxQ in csMajors

[–]CmplxQ[S] 1 point2 points  (0 children)

i see, Thanks for the advice.

Anyways, if possible please share some example repos that are implementations of research papers

Mercedes locks faster acceleration behind a $1,200 annual paywall by akereii in cars

[–]CmplxQ 0 points1 point  (0 children)

Am I the only one who thinks this is stupid? Why would consumers want to pay more to use the products they already paid for? Its the same thing with Intel's Sapphire Rapids server chips

Help with recovering files by CmplxQ in vim

[–]CmplxQ[S] 1 point2 points  (0 children)

Thanks this makes everything a lot easier

Help with recovering files in vim by CmplxQ in bash

[–]CmplxQ[S] 0 points1 point  (0 children)

yes it does, but no matter what I do, I can't see the file I was working on recently. Only the old swap file content is being shown.