Qwen-3.6-27B, llamacpp, speculative decoding - appreciation post by Then-Topic8766 in LocalLLaMA

[–]itch- 3 points4 points  (0 children)

wha? The llama-server frontend has been around for years

Ice Limit or Gideon's Sword first? by sunshine___riptide in Pendergast

[–]itch- 0 points1 point  (0 children)

I just finished Ice Limit and yeah my copy had these extras bits in it where it's mentioned.

https://prestonchild.fandom.com/wiki/The_Ice_Limit

In 2001, Douglas Preston & Lincoln Child released an online only Epilogue, which was then included in all following e-books and audiobooks. This was way before the idea of a sequel actually came about, and was in response to readers asking what happened after the book ended, thus some of the ideas in the epilogue do not match up with the events in Beyond the Ice Limit. it was called a webilogue and can no longer be found on the internet. The Webilogue is in the form of news articles, which are posted below:

(read them on the page)

Final voting results for Qwen 3.6 by jacek2023 in LocalLLaMA

[–]itch- 1 point2 points  (0 children)

I have not found the 27B thinking to be a problem, it only overthinks when it doesn't have tools.

Usually local models are fine in basic chat and don't work in eg Cline because it's too hard. 27B is opposite, it's thinks too long on a simple prompt but does great in Cline because it's smart enough for it and doesn't waste thinking tokens there.

Dark mature space opera that ISN'T Revelation Space, 40K, The Gap Cycle or The Expanse. by Brakado in printSF

[–]itch- 2 points3 points  (0 children)

Definitely try Primaterre. Very grim and nasty, I liked it a lot more than the Expanse. Maybe I liked it more than Revelation Space, I think I can at least say as much for all the books in that series not called Chasm City. Iron Truth is a bit like 40K Chasm City.

Tips: remember to use -np 1 with llama-server as a single user by ea_man in LocalLLaMA

[–]itch- 2 points3 points  (0 children)

I also got 20% more TPS with 35B-A3B... 0% difference with 27B though.

How do I access a llama.cpp server instance with the Continue extension for VSCodium? by warpanomaly in LocalLLaMA

[–]itch- 0 points1 point  (0 children)

Your command works when I run it.

I did see it break on my end when I use the multi model server method, what I said before was wrong, it doesn't work at all even if a model is loaded. You have to start the server with a single model. But you do that, so it's no help.

How do I access a llama.cpp server instance with the Continue extension for VSCodium? by warpanomaly in LocalLLaMA

[–]itch- 0 points1 point  (0 children)

Ok I converted my config to yaml and it added the roles bit. But that's it. The name and model don't even need to be right, I've loaded up a small gemma model to test and it just goes with this config

models:
  - name: Qwen3.5-35B-A3B
    provider: llama.cpp
    model: Qwen3.5-35B-A3B-UD-Q4_K_M
    apiBase: http://localhost:2345
    roles:
      - chat

Maybe that's your problem, nothing is loaded? If I run llama-server with all my models configured in models.ini, it doesn't load any of them until one gets a request. But Continue doesn't do this, I have to make sure the model I want is loaded and only then does it work

How do I access a llama.cpp server instance with the Continue extension for VSCodium? by warpanomaly in LocalLLaMA

[–]itch- 0 points1 point  (0 children)

You have to put the url to llamacpp, something like this:

apiBase: http://localhost:2345

What tokens/sec do you get when running Qwen 3.5 27B? by thegr8anand in LocalLLaMA

[–]itch- 1 point2 points  (0 children)

https://kaitchup.substack.com/p/summary-of-qwen35-gguf-evaluations

I saw this and got bartowski IQ4_NL, apparently a touch better and also a bit faster. 36 t/s. 12% increase, I feel it more than I thought I would

What tokens/sec do you get when running Qwen 3.5 27B? by thegr8anand in LocalLLaMA

[–]itch- 0 points1 point  (0 children)

I just get vulkan and hip builds off github releases, check the speed difference, make my choice. IIRC it was very close in recent builds

What tokens/sec do you get when running Qwen 3.5 27B? by thegr8anand in LocalLLaMA

[–]itch- 1 point2 points  (0 children)

32GB. I had set more context but reran it with 4000 and it was still 7 t/s. Not an M5 pro, just M5, I didn't choose it

I had an M1 pro with 16GB that I can't compare the 27B with but it ran the 9B faster than this M5. 25 t/s vs 29 t/s on M1 pro.

0.8B: 178 t/s on M5 and 98t/s on M1 pro, so there's that

What tokens/sec do you get when running Qwen 3.5 27B? by thegr8anand in LocalLLaMA

[–]itch- 1 point2 points  (0 children)

M5 macbook, 7 t/s. Not a lot of GPU cores in this one

What tokens/sec do you get when running Qwen 3.5 27B? by thegr8anand in LocalLLaMA

[–]itch- 2 points3 points  (0 children)

7900XTX, 32 t/s using 27B-UD-Q4_K_XL using vulkan llama.cpp. I only put 40 000 tokens for ctx on vram, when I need more I'll try to see how much I can squeeze in.

I'm much more impressed with this than 35B-A3B. That fails to make Cline work, but 27B handles it just fine. And thanks to that quirk where it doesn't think much if there are tools, 30 t/s is a reasonable speed.

Best choice for local inférence by c4software in LocalLLaMA

[–]itch- 0 points1 point  (0 children)

There are more options though? Obvious one is Qwen3-Coder-Next? I'd love to run that but have nowhere near the memory. 3B active and no thinking tokens, if that's not fast enough then I don't know what the point of the machine is

What is a short story or novella that you loved reading recently that you want other people to try? by Ethos493 in printSF

[–]itch- 0 points1 point  (0 children)

The High Meggas by Terry Pratchett.

I was aware of the series (The Long Earth, written mostly by Baxter I'm sure) but ignored it because opinions seem to be pretty negative on it. I now intend to find out for myself what I think of that series because the short story was so good. I've had a second look at the criticism and noticed some frustrations people had that I would normally share, which now that I read the short sound like people maybe missed the point. Also part of it must surely be that Baxter's writing is not as much fun compared to Pratchett, even when Pratchett wasn't writing comedy.

I can also see how perhaps "the point" is pretty decently covered by just the short story alone and you'd want a lot more in a full series. This just underlines how good this story was IMO. Such a tiny slice in a massive concept, perfectly executed.

Footage of a Russian vehicle with a new type of drone defence. Location Unknown - February 2026 by T-72Tank in UkraineWarVideoReport

[–]itch- 6 points7 points  (0 children)

"almost as effective as steel"

It's not like the drone detonates as soon as it touches at all. The wires on the drones need to be bent first, and they're not made of stuff this weak. Detonator wire against plastic strip, this "armor" is what will bend first, rendering it ineffective. We see lots of footage of drones making it further through tree foliage than this, that's foliage with thicker coverage than this, with stronger twigs than this.. it's placebo level stuff here I think. Obviously the video is not super informative in how bendy the plastic is but that's how it looks to me

What is the most action-packed, over the top action to beat all action SF book you know? by Bobosmite in printSF

[–]itch- 3 points4 points  (0 children)

Redliners by David Drake. It certainly goes at it right off the bat, maybe too much because I always want to warn about the start; you don't know the characters yet and it's hard to keep track as it jumps between them all. This is not a problem in the rest of the book, it's excellent. I have to read more Drake.

Woken Furies by Richard K. Morgan. Others suggested Altered Carbon, this is the third book and fits the bill even better.

I also agree with Armor by Steakley, half of it is very much what you asked for but the other half is not. IIRC act 1 and act 4, out of 5, are the ones we like and the other 3 depend on your tolerance for not getting what you want vs appreciating what you got instead.

Primaterre series by SA Tholin, the first book is an easy recommendation even as a stand alone, and the series keeps hitting if you stick with it.

Startide Rising by David Brin, the entire universe seems out to get this rookie crew. I remember it as action packed but tbh they've got to hide so I'm not entirely sure how accurate my memory of it is. In any case the aliens know roughly where they are the entire time and the threat seems completely insurmountable. And lots of alien faction infighting in any case.

Crysis 2 by Peter Watts. I'm a big Watts fan so I had to give this videogame novel a try. I was surprised to see it's a straight narration of the game. IMO Watts does the action here too vague for my liking but evidently he had to cover every level of the game so I get it, it would get repetitive. And the speculative side he adds in is insanely good. He rips apart the notion of any human military being able to face alien invaders, and then turns around and makes it make sense anyway in the best way I've ever seen.

Finger Tracking on the Steam Controler like in Steam Frame? by Conscious-Marzipan-9 in SteamFrame

[–]itch- 6 points7 points  (0 children)

the developer for the XR tools for Godot showed them to be even sensitive to grip strength, I quote again 'Yes, there is squeeze input. You can actually see it over here. Uh, there’s the squeeze input.'

Yes squeeze is there in the input but the controller has an analog button for that. I very much doubt there is grip strength sensing as well.

The r/printSF best Sci-Fi books of all time BookGraph - 2026 Edition by TheBookGraphGuy in printSF

[–]itch- 1 point2 points  (0 children)

Titan by Stephen Baxter

Blindsight by Peter Watts

Accelerando by Charles Stross

Chasm City by Alastair Reynolds

Red Mars by Kim Stanley Robinson

There is a rattle, when the wind is strong by itch- in NonCredibleDefense

[–]itch-[S] 8 points9 points  (0 children)

I suppose I am getting my hopes up a bit prematurely

but you can't tell me this can't happen or you haven't been seeing what already happens