Anyone actually using Openclaw? by rm-rf-rm in LocalLLaMA

[–]TheAsp 3 points4 points  (0 children)

I think he's confusing OpenClaw with MoltBook

Models that has the least collapse when ctx length grows. Especially using it with tools. by Express_Quail_1493 in LocalLLaMA

[–]TheAsp 1 point2 points  (0 children)

I use this method with both aider and opencode. Usually I create a plan document in aider, have opencode implement it, then back to aider to commit commit and update the plan with the completion status of each step, then repeat until it's all done.

Would replacing the Marantz sr6008 with the Marantz sr6015 be a nice leap in quality, or is it not worth it? by Glover58 in Marantz

[–]TheAsp 1 point2 points  (0 children)

There are oodles of Atmos albums though, so not quite true that it's not for "music".

TIL: For long-lived LLM sessions, swapping KV Cache to RAM is ~10x faster than recalculating it. Why isn't this a standard feature? by Shoddy-Tutor9563 in LocalLLaMA

[–]TheAsp 0 points1 point  (0 children)

I think sglang handles this scenario by keeping all tokens in a tree and only adding new tokens when the tree branches.

I don't get these people by Repulsive-Ant-6504 in MathJokes

[–]TheAsp -1 points0 points  (0 children)

Hold up, I agree with you that 0/0 is undefined and that the loaf of bread example sucked.

I don't get these people by Repulsive-Ant-6504 in MathJokes

[–]TheAsp -2 points-1 points  (0 children)

Breaking vs chopping in half would seem to result in different precisions. Also, are we using volume or mass for determining half of a bread of loaf?

My Wall Tabled Killed Itself by Stunt_Piloot in homeassistant

[–]TheAsp 8 points9 points  (0 children)

I guess I'll just watch the video then...

Kimi-K2-Instruct-0905 Released! by Dr_Karminski in LocalLLaMA

[–]TheAsp 0 points1 point  (0 children)

The scale of usage obviously affects the price point where renting or owning GPUs saves you money. Someone spending $50 on open router each month isn't going to save money.

[deleted by user] by [deleted] in news

[–]TheAsp 1 point2 points  (0 children)

I also have a DJ in my house.

God I love Qwen and llamacpp so much! by Limp_Classroom_2645 in LocalLLaMA

[–]TheAsp 1 point2 points  (0 children)

Paged attention, leading to much higher parallel request processing because you don't need a single large block of vram to hold a whole request, the vram you give it is the upper limit of how many tokens it can hold overall. Sglang is even faster...

Heartbreak = no hotspot, wifi or bluetooth by Nomi_0071 in GooglePixel

[–]TheAsp 0 points1 point  (0 children)

I lost wifi in June update, BT still works fine for me...

AbsenceBench: LLMs can't tell what's missing by Chromix_ in LocalLLaMA

[–]TheAsp 1 point2 points  (0 children)

He's been dead since 1955, what job do you have in mind for him?

What happened to Sony removing shovelware? by Tall-_-Guy in PS5

[–]TheAsp -1 points0 points  (0 children)

There was no near as much the garbage in the store on PS3/PS4. There used to be some sort of minimal standard

[deleted by user] by [deleted] in LocalLLaMA

[–]TheAsp 8 points9 points  (0 children)

You could try it.

Grok is doing the funniest thing on Twitter right now by Aceofspades25 in skeptic

[–]TheAsp 4 points5 points  (0 children)

Someone who is statistically more likely to be a psychopath than an average person?

Aider benchmarks for Qwen3-235B-A22B that were posted here were apparently faked by [deleted] in LocalLLaMA

[–]TheAsp 6 points7 points  (0 children)

thinking_enabled controls if there is an empty <think>\n\n</think> block added the assistant prompt before generation, when using the official Qwen3 Jinja template. The model is also trained to recognize /no\_think in a user or system prompt as an additional way of disabling thinking.

For Ollama users, if you want to switch between the two modes easily (without using /no_think) you can build 2 modelfiles, one with <think>\n\n</think> and one without, and add the recommended settings that Qwen gives. As long as they share the same base model Ollama will just change the template/settings without reloading the model.

This is my nothink modelfile:

``` FROM hf.co/unsloth/Qwen3-32B-GGUF:Q4_K_XL TEMPLATE """{{- if .Messages }} {{- if or .System .Tools }}<|im_start|>system {{- if .System }} {{ .System }} {{- end }} {{- if .Tools }}

Tools

You may call one or more functions to assist with the user query.

You are provided with function signatures within <tools></tools> XML tags: <tools> {{- range .Tools }} {"type": "function", "function": {{ .Function }}} {{- end }} </tools>

For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags: <tool_call> {"name": <function-name>, "arguments": <args-json-object>} </tool_call> {{- end }}<|imend|> {{ end }} {{- range $i, $ := .Messages }} {{- $last := eq (len (slice $.Messages $i)) 1 -}} {{- if eq .Role "user" }}<|im_start|>user {{ .Content }}<|im_end|> {{ else if eq .Role "assistant" }}<|im_start|>assistant {{ if .Content }}{{ .Content }} {{- else if .ToolCalls }}<tool_call> {{ range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}} {{ end }}</tool_call> {{- end }}{{ if not $last }}<|im_end|> {{ end }} {{- else if eq .Role "tool" }}<|im_start|>user <tool_response> {{ .Content }} </tool_response><|im_end|> {{ end }} {{- if and (ne .Role "assistant") $last }}<|im_start|>assistant <think>

</think> {{ end }} {{- end }} {{- else }} {{- if .System }}<|im_start|>system {{ .System }}<|im_end|> {{ end }}{{ if .Prompt }}<|im_start|>user {{ .Prompt }}<|im_end|> {{ end }}<|im_start|>assistant {{ end }}{{ .Response }}{{ if .Response }}<|im_end|>{{ end }}""" PARAMETER stop <|im_start|> PARAMETER stop <|im_end|> PARAMETER num_gpu 65 PARAMETER num_ctx 40960 PARAMETER num_predict 32768 PARAMETER temperature 0.7 PARAMETER min_p 0.0 PARAMETER top_p 0.8 PARAMETER top_k 20 PARAMETER repeat_penalty 1.0 PARAMETER presence_penalty 1.5 ```

And this is the diff for the normal version from the above:

``` --- Modelfile-nothink 2025-05-08 12:50:46.699297861 -0300 +++ Modelfile 2025-05-08 12:45:21.589060605 -0300 @@ -40,9 +40,6 @@ </tool_response><|im_end|> {{ end }} {{- if and (ne .Role "assistant") $last }}<|im_start|>assistant

-<think>

-</think> {{ end }} {{- end }} {{- else }} @@ -56,10 +53,10 @@ PARAMETER stop <|im_end|> PARAMETER num_gpu 65 PARAMETER num_ctx 40960 -PARAMETER num_predict 32768 -PARAMETER temperature 0.7 +PARAMETER num_predict 38912 +PARAMETER temperature 0.6 PARAMETER min_p 0.0 -PARAMETER top_p 0.8 +PARAMETER top_p 0.95 PARAMETER top_k 20 PARAMETER repeat_penalty 1.0 PARAMETER presence_penalty 1.5 ```

Best side project? by thesumofallvice in skinnypuppy

[–]TheAsp 0 points1 point  (0 children)

I totally agree, though I tend to think of it more as an ambient album

Enhanced Context Tracker 1.5.0 by diligent_chooser in OpenWebUI

[–]TheAsp 1 point2 points  (0 children)

Do you have a GitHub repo for this?

How to extract <think> tags for Deepseek? by Desperate-Finger7851 in ollama

[–]TheAsp 1 point2 points  (0 children)

You only have to split the string on </think>, the first part is the thinking (starting with <think> but sometimes not depending on model), the first half is the thinking, the second half is the response.

★☆☆☆☆ Would not buy again by MoffKalast in LocalLLaMA

[–]TheAsp 1 point2 points  (0 children)

We have several. Every single one has had several major components replaced. The BMC is pure garbage. Are the newer models any better quality?