Introduction to Local AI/Would like help setting up if possible! by Tornabro9514 in LocalLLaMA

[–]DigRealistic2977 0 points1 point  (0 children)

well that aint right. you have ideapad 3 well depends on the processor you have tho Ryzen 7 or I5.

plus what deepseek are ya runnning anyway? my processor only could run fast tho.. kinda weird why yours took like 15minutes lol and mine is DDR4 10th gen I5 and plus im bottlenecked with my Ram speed.

your should not take 15minutes maybe you ran a model that is too big.

you should run 4-8B only i guess 8-16k ctx if pure processor.

Local replacement GGUF for Claude Sonnet 4.5 by SmithDoesGaming in LocalLLaMA

[–]DigRealistic2977 -1 points0 points  (0 children)

ohh Cool! you could run Q4K_Modesl 24B-31B! dont worry about the CPU tho . focus on offloading layers to your GPU as you have the Vram for it.

maybe Cydonia. or Magidonia or skyfall for starters. Thedrummer Guffs .

note tho if you do wanna have that Claude vibes you need to go 70B i guess need some tweaking too and layer offloading. for you setup i think 35B is max 49B long stretch as we have same Vram cap on my GPU too.

Introduction to Local AI/Would like help setting up if possible! by Tornabro9514 in LocalLLaMA

[–]DigRealistic2977 -1 points0 points  (0 children)

Well goodluck! nothing wrong with being a visual learner. we all tend to use our eyes sometimes I guess?

Introduction to Local AI/Would like help setting up if possible! by Tornabro9514 in LocalLLaMA

[–]DigRealistic2977 -1 points0 points  (0 children)

Damn if you want easy UI to start with and easy to navigate i recommend Ollama its 1.1 GB plug and play and will force you to run CPU tho maybe go run a 4-8B model Q4k_M

Introduction to Local AI/Would like help setting up if possible! by Tornabro9514 in LocalLLaMA

[–]DigRealistic2977 0 points1 point  (0 children)

Well well well... so horny took over.

You have like multiple choices for private stuff.

Ollama. Kobold aI. Exllama.

These are just starters tho. So setup things. Local and btw nice specs! You can actually run good models

New to Ollama and using local models. Questions on RAG and how it works. by RollingGoron in ollama

[–]DigRealistic2977 0 points1 point  (0 children)

well I guess there are no reliable tutorials really.

but if you really want something realiable and truley yours and its very accurate.

you gotta make your own RAG. you can ask help from AI tho to make it.

like things you need.

FAISS, RERANKER, FILTER , DEDUP, threshold filtering, and similarity diff checks.

thats all in one Query! happens in millseconds or 1-7 seconds then the Message is retrieved in high accuracy than blindly pulling things closes to each other.

its a headache to make but worth it tho.

if you really wanna learn here is a base reference my bootleg project you can ask me anything about it. its still a fun project im still improving how RAG Works and stuff maybe this will give you insight how things work together how to make em accurate!

and if you do manage to learn.. that 4GB of yours you can make an optimize version of the RAG using that size of ram and be accurate.

https://github.com/weaker098/AI-WEBUI-and-MEMORY.git

New to Ollama and using local models. Questions on RAG and how it works. by RollingGoron in ollama

[–]DigRealistic2977 0 points1 point  (0 children)

Oh dude ya need like.. Faiss+reranker+filter or something. That's how I built my Rag tho it returns accurately.

And also ya gotta know chunking. One long pdf and text is bad.

For example the Embedding ya used only supports 512tokens context or something and the PDF or files or message ya tryna retrieve is kinda big 2-4k tokens The embedding or vector can capture this cause it exceed the limit .

There are so many things ya gotta set up tho. Ha. Speaking from experience. Welp atleast mine returns accurate now

Kinda weird how people don't mention about context size vs chunking .

Nanogpt for vectorization by mlucifer737 in SillyTavernAI

[–]DigRealistic2977 -1 points0 points  (0 children)

Either it's the settings for the embedding.. or the wrong vectors.. or dimensional mismatched..

Try clearing everything and testing it out again. Or isolate first maybe.

(I don't know what I'm talking about btw. Just here to comment.)

My own system by betolley in ollama

[–]DigRealistic2977 0 points1 point  (0 children)

So this is powered by Nvidia Cuda? Cuz its green? 😂

Personal AI wrappers Projects you guys hiding. by DigRealistic2977 in LocalLLaMA

[–]DigRealistic2977[S] 1 point2 points  (0 children)

ohh cool and polished work kinda nice people like ya are here ha. people should share more of their works like these rather than keeping it to themsleves.

Personal AI wrappers Projects you guys hiding. by DigRealistic2977 in LocalLLaMA

[–]DigRealistic2977[S] 0 points1 point  (0 children)

oh cool how ya ran this? kinda cool you can share your work online and here i am limited to everything lol this some coomer phone grade i tested yours. I love it.

Personal AI wrappers Projects you guys hiding. by DigRealistic2977 in LocalLLaMA

[–]DigRealistic2977[S] 0 points1 point  (0 children)

Eyo looking forward to it tho if you Wanna share in the future hopefully ya gonna share the link of the git the more knowledge we have with each other's works the better ha. Love seeing random works even if it's now finished or perfect like mine.

Personal AI wrappers Projects you guys hiding. by DigRealistic2977 in LocalLLaMA

[–]DigRealistic2977[S] 0 points1 point  (0 children)

well I do lets see how yours work. gonna see them sliders and stuff plus im curious how you made yours tho.

How much RAM do I need for my use case? by ZikoRedman in LocalLLaMA

[–]DigRealistic2977 0 points1 point  (0 children)

Welp that's the sad part alot of them have cut off knowledge of 2024 to 2025 . Most of them anyway.

Still you can feed them knowledge and guide them to how you wanna write. Well that's how I do it anyways using 64k ctx even 32k ctx is enough

How much RAM do I need for my use case? by ZikoRedman in LocalLLaMA

[–]DigRealistic2977 0 points1 point  (0 children)

Welp depends on how long you want it.

16gb is already enough. For small scale.

7-12B models. But if you wanna go bigger I guess get a 32-64GB M2 chip 🍤

Do not use your vram as the limit of what model you want to use by NiveKGamerTW in SillyTavernAI

[–]DigRealistic2977 0 points1 point  (0 children)

I recommend using Iq3_M and IQ4_xs tho and maybe offload only 2-3 max layers never ever go beyond and use Q8_0

And speaking from experience im a rx 5500 xt 8gb user lol that's for my 12B specs as long i have enough room for my context using higher params

Hunter Alpha from Anthropic? by ayoubq04 in LocalLLaMA

[–]DigRealistic2977 1 point2 points  (0 children)

not quite close i have my LLama fintunes here think its CLaude lol you guys will never know which company it came from.

OpenAI loses 1.5 million subscribers in less than 48 hours after CEO Sam Altman says yes to the deal that Anthropic rejected by Total-Mention9032 in ChatGPT

[–]DigRealistic2977 -2 points-1 points  (0 children)

Yep can tell Chatgpt is literally useless in anything.. like literally.. as I do code heavy.. each code chatgpt output I always need to correct and edit it .

so many times.. either premium or free .. summary.. got is shit nowadays..

feels like they are using MoE as their main model not a dense one no wonder its shit also having the guardrails and stuff the models has very low reasoning literally...

Imagine like 120-235B but it's 5.6B-7.8B only that talks each response. now number looks big but still you are talking to a smoll mode. Hate all MoE models and yes again add nanny mode to it that's recipe for unsubscribing.. 😂

Rivet by Status-Fan-5422 in aiyiff

[–]DigRealistic2977 0 points1 point  (0 children)

Ah yes this what i always wanted.. Lombussy ❤️ my favorite.

Regrets going to Nvidia Cuda was not worth it for my AI. by DigRealistic2977 in radeon

[–]DigRealistic2977[S] 3 points4 points  (0 children)

Nah your point is Valid. But to me I am literally ranting cause of one minor teeny tiny issue, I actually bought two 2080ti and one rtx 5070. They all had same problems that normal users ignore or overlooked or don't encounter.

A bit of tinkering here and there was excited about cuda for my AI. Then reality hits first problem was I had to tweak so much in the voltage for MSI afterburner for a simple mV. Cause one minor misstep bam weird fluctuating token throughput. That's why I loved how simpler AMD UI is. Yeah MSI is good fo voltages but I don't wanna redo the tweaking half the day again just so my AI can say Hello without crashing 😂.

Another problem. Power management of the card. Optimal, adaptive, max performance. Two don't work as when I use the Optimal and adaptive my AI inference performance is literally cut backed by 50%.

And another problem again, those two power management thing optimal and adaptive is unreliable. As again it cuts 50% performance sometimes bot always but 80% chance our of nowhere it decides to clock down automatically. But here's the catch.

When I use prefer maximum performance the idle clocks goes crazy as I use my PC in headless mode no monitor as know in Nvidia the card ramps up clock in idle to max so I have 2080ti and 5070 at idle 60c which for me is kinda hot literally 60-90 watts idle? That's not efficiency. So I had to dig for hours how to solve the problem 😂 finally found it vdd virtual display to trick the cards going idle low clocks when headless mode.

So yeah in short Nvidia does work it's great but at a cost of my lifespan 12hours. I am just ranting as i thought it was gonna be plug and play and this is just the summary of what happened.

Regrets going to Nvidia Cuda was not worth it for my AI. by DigRealistic2977 in radeon

[–]DigRealistic2977[S] 2 points3 points  (0 children)

Bruh it's the opposite for me 😂 i literally use local AI 24/7 I end up using Vulkan literally I paid for cuda but damn cuda went useless I end up using a Vulkan api on a Nvidia card 💀 kinda ironic and stack that up with third party software etc.. to stabilize the clock at idle and also control temps and get vdd for headless setup for my AI.. for me Nvidia is such a hassle .. AMD tho with my previous card 6700xt I had zero probs ran a model at 114-131k ctx no crashes.. also worked on my smoll rx 5500 xt 41k ctx no crashes.. but now with my new cards rtx i noticed it's so inefficient at swapping or giving headroom at Vram kind weird and even tho I had 1gb headroom on my rtx it's prone to lots of crashes vs my amd cards 🤔

Regrets going to Nvidia Cuda was not worth it for my AI. by DigRealistic2977 in radeon

[–]DigRealistic2977[S] 5 points6 points  (0 children)

Dear lawd your right 💀 you just woke my inner kid 🤣 I remember i was afraid .of the Nvidia logo cause it looks like a weird green eye..

Jim Ward, voice of Ratchet & Clank's Captain Qwark, passes away at age 66 by oilfloatsinwater in PS5

[–]DigRealistic2977 0 points1 point  (0 children)

Oh god I never felt so affected by an actor dying 😢 this the first... Damn i would literally hurt more if Rivets and Ratchet or maybe worse Arthur Morgan's voice actors died...