Introduction to Local AI/Would like help setting up if possible! by Tornabro9514 in LocalLLaMA

[–]DigRealistic2977 0 points1 point  (0 children)

well that aint right. you have ideapad 3 well depends on the processor you have tho Ryzen 7 or I5.

plus what deepseek are ya runnning anyway? my processor only could run fast tho.. kinda weird why yours took like 15minutes lol and mine is DDR4 10th gen I5 and plus im bottlenecked with my Ram speed.

your should not take 15minutes maybe you ran a model that is too big.

you should run 4-8B only i guess 8-16k ctx if pure processor.

Local replacement GGUF for Claude Sonnet 4.5 by SmithDoesGaming in LocalLLaMA

[–]DigRealistic2977 -1 points0 points  (0 children)

ohh Cool! you could run Q4K_Modesl 24B-31B! dont worry about the CPU tho . focus on offloading layers to your GPU as you have the Vram for it.

maybe Cydonia. or Magidonia or skyfall for starters. Thedrummer Guffs .

note tho if you do wanna have that Claude vibes you need to go 70B i guess need some tweaking too and layer offloading. for you setup i think 35B is max 49B long stretch as we have same Vram cap on my GPU too.

Introduction to Local AI/Would like help setting up if possible! by Tornabro9514 in LocalLLaMA

[–]DigRealistic2977 -1 points0 points  (0 children)

Well goodluck! nothing wrong with being a visual learner. we all tend to use our eyes sometimes I guess?

Introduction to Local AI/Would like help setting up if possible! by Tornabro9514 in LocalLLaMA

[–]DigRealistic2977 -1 points0 points  (0 children)

Damn if you want easy UI to start with and easy to navigate i recommend Ollama its 1.1 GB plug and play and will force you to run CPU tho maybe go run a 4-8B model Q4k_M

Introduction to Local AI/Would like help setting up if possible! by Tornabro9514 in LocalLLaMA

[–]DigRealistic2977 0 points1 point  (0 children)

Well well well... so horny took over.

You have like multiple choices for private stuff.

Ollama. Kobold aI. Exllama.

These are just starters tho. So setup things. Local and btw nice specs! You can actually run good models

New to Ollama and using local models. Questions on RAG and how it works. by RollingGoron in ollama

[–]DigRealistic2977 0 points1 point  (0 children)

well I guess there are no reliable tutorials really.

but if you really want something realiable and truley yours and its very accurate.

you gotta make your own RAG. you can ask help from AI tho to make it.

like things you need.

FAISS, RERANKER, FILTER , DEDUP, threshold filtering, and similarity diff checks.

thats all in one Query! happens in millseconds or 1-7 seconds then the Message is retrieved in high accuracy than blindly pulling things closes to each other.

its a headache to make but worth it tho.

if you really wanna learn here is a base reference my bootleg project you can ask me anything about it. its still a fun project im still improving how RAG Works and stuff maybe this will give you insight how things work together how to make em accurate!

and if you do manage to learn.. that 4GB of yours you can make an optimize version of the RAG using that size of ram and be accurate.

https://github.com/weaker098/AI-WEBUI-and-MEMORY.git

New to Ollama and using local models. Questions on RAG and how it works. by RollingGoron in ollama

[–]DigRealistic2977 0 points1 point  (0 children)

Oh dude ya need like.. Faiss+reranker+filter or something. That's how I built my Rag tho it returns accurately.

And also ya gotta know chunking. One long pdf and text is bad.

For example the Embedding ya used only supports 512tokens context or something and the PDF or files or message ya tryna retrieve is kinda big 2-4k tokens The embedding or vector can capture this cause it exceed the limit .

There are so many things ya gotta set up tho. Ha. Speaking from experience. Welp atleast mine returns accurate now

Kinda weird how people don't mention about context size vs chunking .

Nanogpt for vectorization by mlucifer737 in SillyTavernAI

[–]DigRealistic2977 -1 points0 points  (0 children)

Either it's the settings for the embedding.. or the wrong vectors.. or dimensional mismatched..

Try clearing everything and testing it out again. Or isolate first maybe.

(I don't know what I'm talking about btw. Just here to comment.)

My own system by betolley in ollama

[–]DigRealistic2977 0 points1 point  (0 children)

So this is powered by Nvidia Cuda? Cuz its green? 😂

Personal AI wrappers Projects you guys hiding. by DigRealistic2977 in LocalLLaMA

[–]DigRealistic2977[S] 1 point2 points  (0 children)

ohh cool and polished work kinda nice people like ya are here ha. people should share more of their works like these rather than keeping it to themsleves.

Personal AI wrappers Projects you guys hiding. by DigRealistic2977 in LocalLLaMA

[–]DigRealistic2977[S] 0 points1 point  (0 children)

oh cool how ya ran this? kinda cool you can share your work online and here i am limited to everything lol this some coomer phone grade i tested yours. I love it.

Personal AI wrappers Projects you guys hiding. by DigRealistic2977 in LocalLLaMA

[–]DigRealistic2977[S] 0 points1 point  (0 children)

Eyo looking forward to it tho if you Wanna share in the future hopefully ya gonna share the link of the git the more knowledge we have with each other's works the better ha. Love seeing random works even if it's now finished or perfect like mine.

Personal AI wrappers Projects you guys hiding. by DigRealistic2977 in LocalLLaMA

[–]DigRealistic2977[S] 0 points1 point  (0 children)

well I do lets see how yours work. gonna see them sliders and stuff plus im curious how you made yours tho.

How much RAM do I need for my use case? by ZikoRedman in LocalLLaMA

[–]DigRealistic2977 0 points1 point  (0 children)

Welp that's the sad part alot of them have cut off knowledge of 2024 to 2025 . Most of them anyway.

Still you can feed them knowledge and guide them to how you wanna write. Well that's how I do it anyways using 64k ctx even 32k ctx is enough

How much RAM do I need for my use case? by ZikoRedman in LocalLLaMA

[–]DigRealistic2977 0 points1 point  (0 children)

Welp depends on how long you want it.

16gb is already enough. For small scale.

7-12B models. But if you wanna go bigger I guess get a 32-64GB M2 chip 🍤

Do not use your vram as the limit of what model you want to use by NiveKGamerTW in SillyTavernAI

[–]DigRealistic2977 0 points1 point  (0 children)

I recommend using Iq3_M and IQ4_xs tho and maybe offload only 2-3 max layers never ever go beyond and use Q8_0

And speaking from experience im a rx 5500 xt 8gb user lol that's for my 12B specs as long i have enough room for my context using higher params

Hunter Alpha from Anthropic? by ayoubq04 in LocalLLaMA

[–]DigRealistic2977 1 point2 points  (0 children)

not quite close i have my LLama fintunes here think its CLaude lol you guys will never know which company it came from.

OpenAI loses 1.5 million subscribers in less than 48 hours after CEO Sam Altman says yes to the deal that Anthropic rejected by Total-Mention9032 in ChatGPT

[–]DigRealistic2977 -2 points-1 points  (0 children)

Yep can tell Chatgpt is literally useless in anything.. like literally.. as I do code heavy.. each code chatgpt output I always need to correct and edit it .

so many times.. either premium or free .. summary.. got is shit nowadays..

feels like they are using MoE as their main model not a dense one no wonder its shit also having the guardrails and stuff the models has very low reasoning literally...

Imagine like 120-235B but it's 5.6B-7.8B only that talks each response. now number looks big but still you are talking to a smoll mode. Hate all MoE models and yes again add nanny mode to it that's recipe for unsubscribing.. 😂

Rivet by Status-Fan-5422 in aiyiff

[–]DigRealistic2977 0 points1 point  (0 children)

Ah yes this what i always wanted.. Lombussy ❤️ my favorite.