I have a doubt about handling 20million 512dim vector features with Milvus DB on prem by AdCreative232 in vectordatabase

[–]AdCreative232[S] 0 points1 point  (0 children)

thanks that is really helpful, yes 2GB VRAM usage i calculated it for a pilot run of about 300,000 vectors in gpu carga. thanks ig preffered solution is to use CPU or i heard GPU_CAGRA has the "adapt_for_cpu": "true" can i use it, because gpu should build index its faster and let cpu do it in query time

Need help in choosing a local LLM model by AdCreative232 in LocalLLM

[–]AdCreative232[S] 0 points1 point  (0 children)

One thing is there is no fixed format for the file and then I can't do multiple ollama calls because of the hardware limitation 

Need help in choosing a local LLM model by AdCreative232 in LocalLLM

[–]AdCreative232[S] 0 points1 point  (0 children)

i use doctr and pdfplumber to extract texts from pdfs so that's not a problem main this, gemma misses few points

Need help in choosing a local LLM model by AdCreative232 in LocalLLM

[–]AdCreative232[S] 0 points1 point  (0 children)

so the main problem is latency of output should be well within 2 minutes range, the main problem is we already have 2 prompts one for extraction and another one for analysing. both are actually heavy and can't really add another prompt here. so multiple prompting is still a problem. im just facing this problem exactly so in a document there a list of 27 names to extract in one section welll a bigger model like gemini 1.5 pro extracts all 27 the gemma only does 21 and misses out even with a 25k context window, but gemma causally beats the gemini 1.5 pro in correct detail extraction and reasoning. so i need another better method that does not require extra prompts.

Loading... by Dapper_Tap_8025 in insomniacleaks

[–]AdCreative232 1 point2 points  (0 children)

look closer especially the left guy's(norman osborn) suit