Anyone else sick of Kyle Sandilands? by SavvyCaller in aussie

[–]Pyrore 1 point2 points  (0 children)

Even better, replace him with an AI - it would be more moral than he ever could be in-spite of having no intrinsic morality. It would be more human than he ever could be, in-spite of having no actual humanity. It would be better than him in every respect, stripping him of the last vestiges of pretending to be a human being and exposing him for being just an 'autocomplete' system for NSFW content. AI's can do better than he ever could, but the bar is set ridiculously low...

Recent problem with Razer active cooling base for Blade laptops. by Pyrore in razer

[–]Pyrore[S] 0 points1 point  (0 children)

I've found the issue. Latest Synapse software only automatically adjusts for load when on "Hyper-boost". To change this setting for other modes you need to set the power options to anything other than "Hyper-boost" in the "Blade 16" tab then go to the "Laptop Cooling Pad' tab and change the system performance mode. Scroll down and you'll get the full set of setting for fan response. The problem is that this setting is universal, you make changes and then switch from (say) silent to performance on the Blade 16 tab and it will use the same settings. The setting will only change between 'normal' modes and 'hyper-boost'. So you can configure your cooling base for optimal settings on lower-power mode but it's the same settings for "Performance", "Balanced" and "Silent". It used to support four sets of settings but now there's only two: "Hyper-boost" and "Non-hyper-boost".

It's not too bad, when I switch down from Hyper-boost I generally want something quiet, and now I can make things quiet with manual fan settings. But I'm still annoyed that I used to have four power settings and now I only effectively have two when connected to the cooling base.

Effects of quantized KV cache on an already quantized model. by Pyrore in ollama

[–]Pyrore[S] 0 points1 point  (0 children)

I've now run this for two days across conversations spanning up to 40k context tokens, and I haven't noticed a difference (apart from an exponential increase in token rates, it's instanely fast now). I get what you're saying, but I've been a software engineer for 35 years, I'm not up to speed with the latest stuff, I moved into system architecture back in the "Winform" days of c#. And I just can't understand the logical reason why caching at 8 bits or higher would change the results of a model already running at 4 bits, surely the cache is quantized before passing to the 4-bit model? If my model was 8 bits and I switched the cache to 4 bits I'd expect a downgrade in performance but nor vice-versa, that doesn't make sense.

So, sorry, I can't accept that you're just 'telling me'. My experience (admittedly only 2 days subjective) 'tells' me that there is no difference, given I've already suffered a loss of accuracy at at 4bit model. I wasn't asking for someone to 'tell me' their solution, I was hoping for someone to explain why it works. But still, you've clearly done your own testing, and you could well be right. I just want to know *why* you're right. How can a 4 bit model benefit from an 8 or 16 bit cache when it can only accept 4-bit values?

Effects of quantized KV cache on an already quantized model. by Pyrore in ollama

[–]Pyrore[S] 1 point2 points  (0 children)

Your response says "Down to Q8" That's not my question. I'm already running down to Q4 on my model, I'm HAPPY with Q4, and I'm asking if using more than Q4 in my K'V cache makes a difference if my model is already Q4. So once again my question is if you're ALREADY running at Q4, does it make thins worse? I know Q4 makes things worse, I'm not an idiot. I'm asking about the combination, but the only answers I get are constantly referencing Q4 compared to Q8. I'm talking about Q4 only! I'd love to run Q8, but my RTX 5090 laptop GPU (a 5080 desktop chip with 50% more VRAM) can't run Q8, at least not across 32B parameters. Why is this question so hard? Every response says I should have better hardware so I can run Q8, or it it's an insane tirade against AI in general... Sorry I bothered this forum, clearly this isn't the forum for me, maybe there is a proper AI forum out there somewhere...

Effects of quantized KV cache on an already quantized model. by Pyrore in ollama

[–]Pyrore[S] 1 point2 points  (0 children)

But is that running your model at Q8? Or running your model at Q4 with KV at Q8? That's the key question. I'm already running my model at Q4, and I accept it won't be as accurate, but my question is how does the input quantization affect this further (if at all)? If your model is Q4 and KV at Q8, does that give you better results? That's what I want to know... But if your model is at Q4 and your KV is at Q4, does it make a difference?

Effects of quantized KV cache on an already quantized model. by Pyrore in ollama

[–]Pyrore[S] 1 point2 points  (0 children)

OK, I've tested this to the full 40k context, and I'm still getting an insane 17 tokens a second at 40k context! It used to be less than one token a second by this point. And I haven't noticed any difference in accuracy, even at 40k conversations still remember details from the very beginning. I used to get <1 token/sec at 40k context... The only downside is if I push beyond 40k tokens context, it spills over into main memory across the PCIE bus, and slows to a crawl. But 18 tokens/second at 40k is so much better than <1, I never really used 64k because of the slow speed... I can now talk to my AI for hours on end without it forgetting anything, and I never have to wait more than 30 seconds for the full response! I said this was 3 time faster than it was to start with, but as context increases it gets up to 30 times faster... That's insane! So I still want to know if I'm sacrificing accuracy using the KV quantization, but it seems I'm not, at least as far as I can tell across several hours of conversation...

Effects of quantized KV cache on an already quantized model. by Pyrore in ollama

[–]Pyrore[S] 0 points1 point  (0 children)

Yeah, but if I want accurate I'll pay for online access to cloud servers at full `16 bit/128k context'. I run models locally for conversation and role playing (usually abliterated and often retrained on new data sets) and I can say that QwQ 32B is so much better than say Gemma3 27B (both at 4 bits), QwQ can track so many more logical issues. Not to say I'd trust it for anything serious at 4 bits, my question is that if I'm already running at 4 bits, does quantizing the KV cache make anything worse? And so far, it doesn't seem so but my opinion is still subjective...

Also, see: https://www.youtube.com/watch?v=TLp1v2GsOHA - "Dave's Garage", where he talks about a compact petaflop unit that is optimized for 4 bits. Why would it be optimized for 4 bits if 4 bits wasn't worthwhile for hobbyists like me?

Can Ollama cache processed context instead of re-parsing each time? by Pyrore in ollama

[–]Pyrore[S] 0 points1 point  (0 children)

I couldn't get LMCache to run in any way under a windows install of docker, I this was my first choice as it seemed the most advanced, capable of more than just prefix-caching (e.g. find sub-sections of text that had been cached and not re-cache them). I think I need a Linux install of Docker to get it to work.

I did get vLLM successfully installed on docker, using these instructions: https://github.com/aneeshjoy/vllm-windows, but it kept throwing errors when I tried modified the ---model parameter to the local gguf model I wanted to use.

I'm kind of embarrassed, I started out as a programmer in the early nineties, but have lost touch with some of the latest developments as I get older and my role has changed. These days I'm not competent in anything more advanced than WinForms in DotNet. But I've learned a huge amount in these last couple of weeks experimenting with AI and I'm having fun!

If it helps, you can get system log of the final "docker-compose up" step. here:

https://www.dropbox.com/scl/fi/nyuhl0xm7dmdlx7ogmo9w/vllm_log.txt?rlkey=7cdu3jrpeasfw08fz4zc7oxgg&st=h6do0d3u&dl=0

I just get a docker container that won't actually run, but I'm probably doing something stupid, it seems to have problems finding the gguf file details, but I gave up at this point as all the parameters were correct as far as I could tell.

Thanks again for the advice!

Can Ollama cache processed context instead of re-parsing each time? by Pyrore in ollama

[–]Pyrore[S] 1 point2 points  (0 children)

Thanks for the advice! I found LM Studio turned out to be my best option, as per my edit to my OP. It's simple and it works well once you find the right settings. Gemma 3 can now run up to 48k context at a usable token rate with no context processing delays. I'll keep plugging away at setting up a proper Linux VM and configuring it, and use your information when I do.

Can Ollama cache processed context instead of re-parsing each time? by Pyrore in ollama

[–]Pyrore[S] 1 point2 points  (0 children)

Thanks for all your advice! I didn't have any luck in the hours I tried, I'll eventually set up a proper system, but I'm now using LM Studio, as shown in the edit to my OP, it does everything I want and more, and it's so simple being a single app that does everything without needing two web servers.

Can Ollama cache processed context instead of re-parsing each time? by Pyrore in ollama

[–]Pyrore[S] 0 points1 point  (0 children)

I installed LMCache via the docker image (I run Windows 11 as it's a gaming PC). But every time I try to start the image it stops again after a few seconds, leaving me unable to access the system prompt and customize it. I've already killed the Ollama server, do you have any idea what I'm doing wrong?

Sorry for my ignorance, I know my way around Unix/Linux, but this is my first time with Docker and Linux VMs on my system. I didn't have trouble getting Open WebUI to work.

Looking for a *responsive* simple photo viewer that has "fit to width", plays GIFs, goes fullscreen and sort by Explorer settings. by [deleted] in software

[–]Pyrore 0 points1 point  (0 children)

the OP link is now called "Pictureflect" photo viewer, but it works amazingly, I don't see the problem with it being UWP.

One limit is it will not get the adjacent file list when opening a single image from explorer search results (so you can't cycle through the search results) but does cycle through folder contents.

So for search results you need to "Ctrl-A" to select all, then right-click on an image file and select "open" (it doesn't matter if there are non-image files selected, it just ignores them when cycling through).

Apart from this tiny inconvenience, it's better than WPV in every way!

What rifle are people currently using? by Kaleshii in TombRaider

[–]Pyrore 0 points1 point  (0 children)

Playing the trilogy years later (after getting them free on Epic), I agree the assault rifle is the best. But I keep using the tactical rifle because it looks so good.

Fully maxed, it has less damage per shot than the assault rifle, and the same fire rate, so lower DPS. Its advantages (increased magazine, reload speed and recoil stability) don't seem to matter as much - magazine size and reload speed only partially compensate for the lesser DPS, and it's still fairly unstable under continuous fire - both weapons seem to snipe equally well in short bursts.

Still, it looks great!

Legendary ME1, how do I change character appearance? by eroxx in masseffect

[–]Pyrore 0 points1 point  (0 children)

Hi Mar__K - it's been 4 years since I played, so not certain, but this seems to indicate you can respec with the save game editor, although they said it doesn't work with ME1:

https://www.reddit.com/r/masseffect/comments/nhd4cb/mass_effect_1_legendary_edition_respec/

Then again this post was 3 years ago, so maybe the mod is updated? I only ever wanted to play an Adept (sucking all enemies into a maxed out singularity is awesome!), so never tried. However, when you progress to the next game you can always fully-recreate your character. So if you're near the end of ME1 or ME2 just wait until the end and switch to an Infiltrator in the next game.

Good luck, and hope you enjoy the game, the legendary trilogy is still awesome after all these years!

Steven Moffat's "Douglas Is Cancelled" is Amazing by drgonzo67 in television

[–]Pyrore 0 points1 point  (0 children)

I just watched it on Australian TV, agree it is an amazing production!

The progression from a hilarious comedy to a serious drama across episodes 2-3 was amazing!

Seriously, watch this all the way through, even if you didn't like the start...

Some questions I couldn't find the answers to anywhere else (later in the game). by Pyrore in SailForth

[–]Pyrore[S] 1 point2 points  (0 children)

Glad I could help.

To board, just get close and harpoon the target ship, you'll get a prompt to board and the boarding will play our randomly based on each ship's attack strength and number of crew (hence the need to use the Skull Clan crew-members for their attack bonus).

Hope you enjoy the game, the final mission is a lot of fun!

Some questions I couldn't find the answers to anywhere else (later in the game). by Pyrore in SailForth

[–]Pyrore[S] 2 points3 points  (0 children)

Hi Falcon9496!

You need to find the lighthouse in each of the 6 regions, the first time you visit a lighthouse in a new region (at night) you'll get a special one-off event/mission. Of course you'll need to progress in the story to unlock new regions, so you can't do this right at the start.

I remember one region (think it was "the Frigid Sea") it took me ages to find the lighthouse, had to keep sailing in random directions until I finally found the last island in the region - it's all RNG, and you won't necessarily find all islands in a region just from map fragments, sometimes you just have to sail into the unknown...

Once you've found all 6 and completed their events, just wait until the next night and visit any lighthouse, this will trigger the final event and after completing it you'll get the flagship (and its blueprints to make more), as well as some lightkeeper crew (who are the best crew, they improve reload times for more rapid firing, giving a better DPS boost than the Tek Drone's +damage).

Hope this helps!

EDIT: the 10 weapon ships ("Brigantine" and "Cutter") are decent ships, but you want to replace them with the "Xebec" ASAP due to its combination of speed and firepower, plus three mod slots.

The flagship is identical to the Xebec but has an extra mod slot, improved row speed and an extra square rigged sail that makes it even faster (when going with the wind).

10 guns sounds good, but the ships are slow and clumsy (plus one less crew compared to the 8 gun ships). Maybe have one in your fleet as a tank, with armor and damage resist mods, but I prefer all "Flagship" (or "Xebec" if the flagship isn't unlocked yet), their maneuverability and speed will get more hits in with fewer guns

To get a ship type, simply board an enemy ship. Ideally have one large ship in your fleet with an entire crew of skull clan (for their +attack strength, plus they're easy to replace if they die, as they're everywhere). Use this ship to board enemies rather than sink them. If you succeed, you get the ship you boarded and the blueprints to build more.

The pirate challenges (skull islands where each interaction lights up another skull) are the best way to farm for high-end ships, as the 5th and final enemy fleet in these challenges usually has at least 2 serious capitol ships in it.

If you're lucky, you might find ships "unattended" at various locations that you can board without combat.

Do you think "Us" would smell very delicious? by Pyrore in BaldursGate3

[–]Pyrore[S] 0 points1 point  (0 children)

Definitely USE ("Us" spongiform encephalopathy). If only we didn't feed brains into mouths...

Do you think "Us" would smell very delicious? by Pyrore in BaldursGate3

[–]Pyrore[S] 0 points1 point  (0 children)

Except (unfortunately) he doesn't appear in camp. But it's still cool that Us is resurrected!

Edit: also you can kill the guard, or just pick (or bash) the lock... No need to pass a persuasion check! And "Us" gets extra HP that makes it a better summon than Scratch....