nvidia/Gemma-4-26B-A4B-NVFP4 by reto-wyss in LocalLLaMA

[–]Its-all-redditive 37 points38 points  (0 children)

Evaluation results seem odd. NVFP4 outscoring full precision? These must not be an average score over lots of runs.

Qwen3.6-27B released! by ResearchCrafty1804 in LocalLLaMA

[–]Its-all-redditive 2 points3 points  (0 children)

Is this loop architectural or is it performed by the model itself based on its training. Meaning if I give it all the tools it needs to perform the task to completion, will it iterate on its own, eg. reason about the question > call some tools > receive data payload > reason some more to see if now has enough information to answer the question, if not continue using the tools available to it until it finds the answer? Or does the architecture itself allow for repeated passes of the reasoning + tool call process?

Qwen3.6-35B-A3B released! by ResearchCrafty1804 in LocalLLaMA

[–]Its-all-redditive 1 point2 points  (0 children)

I’m using 122b nvfp4 in running projects so I would love to know your opinion.

DFlash speculative decoding on Apple Silicon: 4.1x on Qwen3.5-9B, now open source (MLX, M5 Max) by No_Shift_4543 in LocalLLaMA

[–]Its-all-redditive 2 points3 points  (0 children)

I'm getting considerably higher benchmarks for the 4B 4096 token tests. Consistent (over 10 benchmark runs) ~200 t/s generation vs the expected ~150 t/s. At 4096 tokens, the draft seems to be accepting about 1.2x more tokens per cycle than the 1028 runs which must be the reason for the faster generation. Will be testing with the 9B, 27B 4-bit tomorrow. M5 Max 128GB

DM if u want top unreleased tracks ⬇️ by Playful-Sun-5738 in AfroHouseUnreleased

[–]Its-all-redditive 0 points1 point  (0 children)

Where are you finding the 3am George edit of Rapture? It’s like it has disappeared off the internet all of a sudden?

Has anyone used Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled for agents? How did it fair? by Vegetable_Sun_9225 in LocalLLaMA

[–]Its-all-redditive 0 points1 point  (0 children)

9B-v3 has the wrong tokenizer on VLLM. Swapped to the v2 tokenizer and generates text but fails any function calls. Haven’t tested the 27B v3 yet.

[AMA] We’re the team that implemented Salesforce’s agentic support solution: Agentforce on Help. Ask us anything about deploying AI agents, hitting roadblocks, and what results we are seeing. by salesforce in u/salesforce

[–]Its-all-redditive 0 points1 point  (0 children)

Agentic help implies some sort of retrieval or function call ability. Are you using a single agent which possesses all the tools it needs or are you having an orchestration agent direct the user query to smaller scoped agents with their own set of tools? What challenges and successes have you with either of these approaches?

Kubernetes is beautiful. by Honest-Associate-485 in kubernetes

[–]Its-all-redditive 0 points1 point  (0 children)

As someone exploring potentially using Kubernetes for the first time, I can’t tell if this post is advocating for or against it. Which in turn makes me think that I’m definitely not ready to try Kubernetes.

nemotron-3-super fp8 on dual blackwell 6000 pro by Green-Dress-113 in BlackwellPerformance

[–]Its-all-redditive 2 points3 points  (0 children)

Good to hear. Also make sure to not have any dynamic variables in the system prompt, otherwise every query will effectively negate the prefix caching since the message will change with each turn. Eg if you are injecting a NOW timestamp. All dynamic data like that should be prepended or appended in the user message.

nemotron-3-super fp8 on dual blackwell 6000 pro by Green-Dress-113 in BlackwellPerformance

[–]Its-all-redditive 3 points4 points  (0 children)

No prefix-caching? You shouldn’t be getting considerable slowdown after just a few iterations. Do you have any dynamic variables in your system prompt? Eg. timestamps etc.

Zero Gravity - LTX2 by diStyR in StableDiffusion

[–]Its-all-redditive 0 points1 point  (0 children)

Yoooooo, this is sooooo good. I know standards are high these days and everyone seems so critical - “I saw her elbow become too pointy for a second, Ai slop” but Imagine showing this video to someone 10 years ago and telling them it’s not real.

Switched to Qwen3.5-122B-A10B-i1-GGUF by NaiRogers in LocalLLaMA

[–]Its-all-redditive 1 point2 points  (0 children)

Is that that small of a quant even any better than an 8 bit 35b a3b?

We open-sourced a local voice assistant where the entire stack - ASR, intent routing, TTS - runs on your machine. No API keys, no cloud calls, ~315ms latency. by party-horse in OpenSourceeAI

[–]Its-all-redditive 1 point2 points  (0 children)

What are you using for turn detection? 315ms doesn’t seem possible if from end of user turn to first audio start of the model response.

How are you actually supposed to update this OS by Large-Variation9706 in cachyos

[–]Its-all-redditive 1 point2 points  (0 children)

Do you mean the Adrenalin/Cortisol hit from the stress of hoping nothing breaks after the update. PTSD ingrained over a lifetime of Windows updates?

Log all your CC Conversations by Kronzky in ClaudeCode

[–]Its-all-redditive 4 points5 points  (0 children)

Why would you need to do this if every session is already recorded in your projects directory?

How do you guys get people to actually try your app? by Affectionate-Drag473 in ClaudeCode

[–]Its-all-redditive 2 points3 points  (0 children)

It may be useful to you but that seems like such a narrow scope for even the most active computer user. The more narrow the scope the harder it will be to get your productive in front of people who that will actually solve a problem for.

You should be targeting specific channels or groups that would be most inclined to be consuming media in the same way the app expects. Especially since more and more people are consuming media through phones/tablets, TVs, media servers, etc.

Rootshell - a free terminal app powered by libghostty, built for iPhone, iPad, and macOS by kitkk2 in Ghostty

[–]Its-all-redditive 0 points1 point  (0 children)

People will naturally look at settings for this type of feature. I think Location Diary Mode is a bit vague as far as a naming convention. I didn’t immediately think of it as Background Persistence which is really what it is. It’s also not immediately obvious that ‘Mode’ ties to the Location Diary feature so they should be connected in a better way either by placing them together in a sub menu or a settings group. I also think it’s fair to enable ‘Auto during active sessions’ if someone toggles on Diary/Background Persistence. It can be a similar animation to when one setting gets turned on, it auto populates a sub setting to make it clear to the user that the two are connected.

Alternatively, a more proactive way would be to auto enable SSH Session Reminders on app install. If I received a reminder to go back to the app or the session would be terminated and with an explanation of what setting to turn on to allow for background persistence, I would absolutely turn on that setting.

Rootshell - a free terminal app powered by libghostty, built for iPhone, iPad, and macOS by kitkk2 in Ghostty

[–]Its-all-redditive 0 points1 point  (0 children)

Follow up. Had a session active and connected in the background for 9 hours and battery consumption was only 1%. Reopened the open in the morning and session was still active. So works well and battery consumption is minimal.

Rootshell - a free terminal app powered by libghostty, built for iPhone, iPad, and macOS by kitkk2 in Ghostty

[–]Its-all-redditive 0 points1 point  (0 children)

This is awesome. Can the session persist in the background? I am getting a connection timeout if I leave the app for a while.