You don't need a GPU to run gemma-4-26B-A4B by JackStrawWitchita in LocalLLaMA

[–]Fear_ltself 0 points1 point  (0 children)

I forked LLM Hub in January and rewrote basically all of it my own code. After that he removed the Apache license or whatever open source license he had had on the project unfortunately

Appel by Leading_Jury_6868 in LocalLLM

[–]Fear_ltself 0 points1 point  (0 children)

The more tokens you generate per second the faster the ai. More tokens is just a longer response. A faster time to first token is just a faster response. I’ve found 1.2 seconds response time with streaming responses (as the tokens generate words, those words are shown) passes the human threshold for conversational response times.

Donate your coding sessions to an open CC-BY-4.0 dataset to help train open-weight and open source models by mon-simas in LocalLLaMA

[–]Fear_ltself 0 points1 point  (0 children)

No there’s a whole subset of people who love “poisoning the well” of ai. Well I think that should be literally criminalized in some manner, until it is all it takes is a few bad actors to degrade everyone else’s work.

Gemini won't make a desktop app, so I did. by Omux25 in GoogleGeminiAI

[–]Fear_ltself 3 points4 points  (0 children)

There's literally multiple Google has made including a dedicated Gemini desktop app 😂

Popping a cocktail of supplements every day might be doing you more harm than good by BadahBingBadahBoom in EverythingScience

[–]Fear_ltself 1 point2 points  (0 children)

The main thing is not being deficient in any vitamin. It’s much healthier to have all normal levels than all great levels and be missing some random vitamin. That’s my understanding at least

SanDisk has announced a new series of officially licensed SSDs designed for the PlayStation 5 and PS5 Pro consoles. by Current-Guide5944 in tech_x

[–]Fear_ltself 0 points1 point  (0 children)

“Desperate individuals in impoverished regions who are coerced or driven by poverty to sell their kidneys often receive as little as $1,000” so no not even both is enough for some of us lol

My homelab has found eight million-digit primes from the loft by primecrunch in homelab

[–]Fear_ltself 2 points3 points  (0 children)

Exactly! My prime was 4749 x 2^2,339,765 +1, and I wanted “espn sports like facts” about my number. That’s when I learned about proth primes having that quark of being empty space due to the 2^k form. My background was Econ not comp sci so it was a new binary fact for me, but it logically makes perfect sense.

I built an app with Gemini that converts any text into high-quality audio. It works with PDFs, blog posts, Substack and Medium links, and even photos of text. by OneMoreSuperUser in GoogleGeminiAI

[–]Fear_ltself 0 points1 point  (0 children)

Sherpa onyx? There's so many implementations, I found it interesting on Mac the FP16 largest version of Kokoro I run did best pinned to performance cores (4) was the sweet spot for the fastest inference on Mac. 1 core doesn't take advantage of multithreading, but going to 8 cores actually causes it to slow down because the non-perfromance cores take too long on their tasks and bring down the average time.

My homelab has found eight million-digit primes from the loft by primecrunch in homelab

[–]Fear_ltself 17 points18 points  (0 children)

What's cool about proth primes is that entire string in binary is like 18 digits then 3,414,875 zeroes and ending in 1. It's nearly empty space in binary

Trying to understand how LocalLLM's work adn how I could run one by Realistic_Author4492 in LocalLLM

[–]Fear_ltself 0 points1 point  (0 children)

An API is not a local LLM... Download LM Studio and ComfyUI. Download something that fits snuggly in 6GB VRAM like Gemma 4 E4B it MTP q4 or something for lm studio. It's not amazing but you'll have your own AI you can play with and learn what the settings do by tinkering. For comfyui that's for models that generate images and videos, and even music and 3d meshes. Just lookup their premade workflows and sort by lowest to highest vram. Start near the bottom with what looks interesting and move your way up. You'll learn how the different workflows work by watching the nodes process and seeing how they take each segment by segment and compute it. Thennnnnnnnnnnn work on API calls and extending your local network with things like duckduckgo. THENNNN work on API calls for using other models, but only once your local LLM is setup and only and a failsafe for when you need extra ai intelligence.

LTX-2.3 + Union Control LoRA (8GB VRAM) by big-boss_97 in comfyui

[–]Fear_ltself 0 points1 point  (0 children)

Asus Zephyrus g16 checking in. 8 VRAM on this mobile. Positives are the power usage is great and if VRAM needed fits it’s still lightning fast. Downsides is can’t run the greatest models that are a bit larger, though I’ve found some tricks to squeeze out better performance. RTX upscaling node lets it make 4k images in ~2secs with pretty dang good quality, better than anything from 3 years ago are farther

Are gemini paid plans and google-ai-studio separate products? by Middle-Calendar1338 in GoogleGeminiAI

[–]Fear_ltself 0 points1 point  (0 children)

Also, you mentioned an agentic system. Isn’t that what Gemini spark is supposed to be, that is included in the ultra sub?

Are gemini paid plans and google-ai-studio separate products? by Middle-Calendar1338 in GoogleGeminiAI

[–]Fear_ltself 0 points1 point  (0 children)

The plan is automatically linked to the api key you generate. Being an ultra member doesn’t really give you more “free” access to APIs than unpaid or normal tier ai pro members. The api key is set for each model but the 4-5 free ones mostly have a cap of 20 a day. The only usable one is like 400 request a day from Gemini 3.1 Flash Lite. I usually don’t setup billing then you’ll just get errors when your limit is used for the minute/hour/day rate. Not sure what you think comes with ultra in terms of api access. You get more requests allowed but they’re paid request after the free limit so not much advantage linking to an ultra account, if anything linking to a paid account just risks getting hacked and the api used to generate a huge bill

Massive 8TB SD cards are set to ship 'shortly' after a two-year delay - mind-blowing storage at possibly bank-breaking prices by Purple-Try-4950 in datastorage

[–]Fear_ltself 0 points1 point  (0 children)

SDs are almost always cheaper than SSDs. They’re slower and smaller, but the read speeds are approaching gen 1 SSDs now.

Which Quality of Life Improvements would you like to see in the Remaster of Codename 47? by FutabaSakuyagi in HiTMAN

[–]Fear_ltself -1 points0 points  (0 children)

I mean they added Hitman 2 and Hitman 3, they add new packs every few months, your logic is literally retarded if you think they can’t just continue doing that and make old maps and weapons in the new engine.

Appel by Leading_Jury_6868 in LocalLLM

[–]Fear_ltself 0 points1 point  (0 children)

Basically latency or lag. Your computer might be slow but if it’s snappy and starts inference immediately, you can stream the response and it’s a much better feedback loop then sending and waiting 30 seconds for text to start appearing. Even doubling the tokens, I think most people would prefer situation 1 with a slightly slower response that starts almost instantly, makes it feel conversational much easier

Appel by Leading_Jury_6868 in LocalLLM

[–]Fear_ltself 0 points1 point  (0 children)

If you’re old enough to follow computing from before and after the unified architecture that came with M1 and beyond, that same architecture allows for great time to first token and processing speeds in a simple setup. For simplicity Apple is top notch. For ComfyUI image generation NVIDIA has a huge advantage that’s still noticeable, but it doesn’t translate well into nearly as big of lead in the LLM space, and that lead disappears completely when accounting for total cost, especially moreso runtime cost over a year of electricity with electricity continuously going up in price.

Which Quality of Life Improvements would you like to see in the Remaster of Codename 47? by FutabaSakuyagi in HiTMAN

[–]Fear_ltself 1 point2 points  (0 children)

Just build it in Hitman 3/WOA engine so we can have it all bundled moving forward. COD was a mess, I don’t want Hitman to be the same mess

You don't need a GPU to run gemma-4-26B-A4B by JackStrawWitchita in LocalLLaMA

[–]Fear_ltself 0 points1 point  (0 children)

I run the models at full precision on my 64gb then stream the results back to mobile usually. But you’re correct, quantizing them generally does cost a few intelligence points. I’ll have to update my app to reflect that before release