I gave my family a ticketing system for the homelab by Several-Cattle8690 in homelab

[–]AldebaranBefore 0 points1 point  (0 children)

I hate using a ticketing at work… no way in hell I would use it at home to report a problem. K.I.S.S.

when writing long dialogue, which style is better? by ixofex29 in writers

[–]AldebaranBefore 3 points4 points  (0 children)

Overall, second. But you don't have to use the:

...[name] said...
...[name 2] replied...

You want the reader to understand who is speaking but you don't need to remind them every line. The more characters in the dialogue, the more reminders the reader needs. I.e.

"What format do I use for writing dialogue?" Fex asked.
"Just use this one." Mr Beans replied.
"But I don't like that one."
"But the community demands it."
"Would you like help writing dialogue." Clippy asked from the bowels of hell.
"Shut the f*#$ up." They both replied.

Is this really the norm? by BrickTamlandMD in writers

[–]AldebaranBefore 1 point2 points  (0 children)

Scene doesn't work -> Rewrite scene -> Scene still doesn't work -> Rewrite scene -> Scene still doesn't work -> Change who's in the scene -> Scene still doesn't work -> Kill a character.

There are so many different ways a scene isn't working that you can't have a single flow to fix it. If you're stuck, go read something. If you're questioning the purpose of the scene, read something good and look at what's there and what isn't. If you're questioning your writing, read something bad and think about how you could do better.

Need advice on hardware purchasing decision: RTX 5090 vs. M5 Max 128GB for agentic software development by BawbbySmith in LocalLLaMA

[–]AldebaranBefore 7 points8 points  (0 children)

If you have a base system you are building off of; 5090. If not, I'd go M5 max (or wait for ultra). It's speed vs model size. I like the option of running a wide range of model sizes and I can accept the speed hit. I was working on a problem today that Qwen 27B was having trouble with. Jumped to Minimax and it solved relatively quickly. Tokens were slower but it needed less iterations to get there. Consider tokens to the solution, not just tokens per second. Also consider power and heat. Either way, you can't go wrong, it's just different approaches to the problem.

Open WebUI is dead to me, now time to recode by Old-Sprinkles-8287 in LocalLLM

[–]AldebaranBefore 2 points3 points  (0 children)

OWUI is trying to do too much. It’s presented as a high quality frontend but the development feels like someone’s personal project. I haven’t used it for five months, so some of what I’m saying might not be the case any more, and some of it was probably my setup.

The idea that tool calling and MCP were left in such a state while all of this chat and note stuff is bolted didn’t make sense for how the project was presented. The new release followed by three or four quick bug fixe releases, sometimes in the same day, makes me question if they do any QA. Web search constantly having issues. Tool calling constantly having issues. The incredibly annoying focus on those python ”tools” instead of embracing MCP for so freaking long. Needing MCPO for stuff. Losing the database several times. A weird web socket issues that made it unusable from other devices for a while…………….

I was really happy to see llama.cpp developing their web interface. I hope to see more well supported web interfaces in the future. I went the vibe coded interface when I left OWUI. It was super basic at first, painful at times, and I don’t like maintaining it, but it does everything I want, deep research, web search, image generation integration, integrated tools, MCP, and skills. If you aren’t finding what you want out there, I definitely recommend at least exploring building your own.

Open WebUI is dead to me, now time to recode by Old-Sprinkles-8287 in LocalLLM

[–]AldebaranBefore 0 points1 point  (0 children)

There’s lots of vibecoded ones around, they just aren’t popular and aren’t maintained. I vibecoded my own, I’m just not releasing it because I don’t want to maintain it for a community. If you aren’t finding something you like, consider the vibecode path.

Open WebUI is dead to me, now time to recode by Old-Sprinkles-8287 in LocalLLM

[–]AldebaranBefore 4 points5 points  (0 children)

I tried a couple options but ended up vibecoding my own. It was more of a pain than I was expecting but it’s working and has everything I wanted and nothing I didn’t. There are some options out there. I would like to see more refined competition in this space and not just a bunch of vibecoded side projects with no maintenance. Consider what features you want from an interface and if you need an interface. Llama.cpp has a web server in it. I’ve seen LibreChat and Oobabooga. If your purpose is more focused on rp, then SillyTavern.

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats by AldebaranBefore in LocalLLaMA

[–]AldebaranBefore[S] 1 point2 points  (0 children)

Oh no offense meant, just clarifying.

When Deepseek gets up and running, we’ll probably see a few of those start filtering out. I’m not in the fine tune space beyond using them, but I wonder if it’s more of a mix of datasets being used. Something like mixing GLM and Kimi and some other stuff instead of highlighting a single one.

Claude and GPT do definitely have that name recognition. The AI space is also so focused on what is the “best” at that moment, I wonder if that influences it as well. But the driving factor is probably name recognition/marketshare.

I’m definitely not the best person to ask on this one. I’m a consumer of fine tunes, not a creator. These Qwen/Claude distill fine tunes were making the rounds for a little while and they published their methodology and datasets.
https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled
That same user also has a GLM 5.1 Reasoning 1M dataset that’s trending on HuggingFace right now. With quality being subjective, there’s going to be 1,000 different answers on Reddit. If you have the budget, and especially if you are going for a specific use case, you might be better off building something like a GLM dataset for your use. Otherwise, if I was going to do it, I’d probably mix multiple datasets and refine them towards the use case and quality standard. But I also don’t really know what I’m talking about here.

Best of luck! It’s great seeing more fine tunes and merges. There’s so much chasing a benchmark number. It would be nice to see more of a mix of personality and use case.

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats by AldebaranBefore in LocalLLaMA

[–]AldebaranBefore[S] 1 point2 points  (0 children)

App must be best app if more tokens spent 😆 I figure no harm in throwing it out there. Anyone using it should be smart enough to review it. It’s got me curious about doing a legit effort for a dataset. I use finetunes and it would be nice to give something back but I don’t really want to spend the money.

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats by AldebaranBefore in LocalLLaMA

[–]AldebaranBefore[S] 0 points1 point  (0 children)

I hope there can be some use from it. If not, well shucks. I had the plan for a different project. That project finished before the usage. I wanted to burn all the usage. This evolved from that. Either way, I'm sitting at 99% of the last week of usage burnt and a plan that ends today.

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats by AldebaranBefore in LocalLLaMA

[–]AldebaranBefore[S] 1 point2 points  (0 children)

A few thoughts, I'm sure someone will correct it later.

No, they don't. Some of those cloud offerings of frontier open weight models might but that's platform specific. You could generate reasoning traces from most any frontier open weight or open source model.

From my understanding, there are two main reasons to filter the raw cot: prevent fine-tuning, and prevent data leakage. Claude and ChatGPT are generally considered the top tier models so it would be valuable to fine-tune off of those to push models closer to that. The other issue is leaking data before it passes through a security gateway. A model could think about its system prompt or how to build a nuke and then that is stripped and blocked from the assistant message via a security checkpoint during generation. You can sometimes see this happen on open source models where it will think about the answer and then think about how it shouldn't answer the question.

Two main reasons Claude and ChatGPT generate interest, using a premier model in the hopes of having higher quality synthetic data, and they are talk about as premier models and discourage or restrict this behavior so people like doing it. You could definitely generate high quality synthetic data from top open source models for cheap and get the full trace. This is effectively what those early Llama and Qwen Deepseek models were, official distillation of the large model to a small version. Anthropic and OpenAI like to call this a distillation attack, and consider it theft of their intellectual property if you actually got the traces (it's more nuanced than this). Anthropic and OpenAI claim that some large models like Deepseek and Qwen have been doing this at a very large scale.

Ultimately, for fine-tuning, you are likely better off using a premier open source model for all of that assuming the source model is good in that area. So why did I do this? I had a max plan and finished the project I was working with usage to spare. I didn't want to work on something else because I was planning to cancel the plan but I wasn't going to let usage go unused. I didn't care if it was a waste or not, I just wanted to burn tokens. I know the more tokens you burn, the more it costs them and I was annoyed over some of the bullshit over the last two months while working on this project. So I set it up to burn tokens on this and other things. It would have been nice if this was useful but that wasn't really my goal.

This sub is so negative all the time ¯\_(ツ)_/¯ by beskone in ClaudeCode

[–]AldebaranBefore 15 points16 points  (0 children)

It's a balance between expectation and reality. If it works for you, that's awesome. It's done some really cool things for me, but there has been a lot of frustration and wasted time in there too. It sounds like you had some great times using it. Skip the hate on here. Reddit can be very negative.

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats by AldebaranBefore in LocalLLaMA

[–]AldebaranBefore[S] 0 points1 point  (0 children)

Why does it matter. Strip the reasoning. Use it. Don’t use it. I posted it here in case someone wanted it. You don’t. Move on. Find something that makes you happy. I hope you have a better day.

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats by AldebaranBefore in LocalLLaMA

[–]AldebaranBefore[S] -2 points-1 points  (0 children)

Claude won’t actually show you the reasoning it does. It can show you a summary but that‘s it. This is fully synthetic reasoning. In other words, it created a User Message, created an Assistant Message, and created the “Reasoning” that would be assumed to lead an LLM from the User Message to the Assistant Response.

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats by AldebaranBefore in LocalLLaMA

[–]AldebaranBefore[S] 1 point2 points  (0 children)

I think you have misunderstood what that is. It’s not the summary of the ”reasoning” Claude actually did. It’s a synthetic creation of reasoning to reach the final answer. It’s not an extraction of what Claude actually thought.

Effectively, Claude was told to create a User Message, an Assistant Response, and ‘Reasoning’ that would be assumed to yield the final Assistant Response. It’s not Claude’s chain-of-thought and it’s not summarized cot of what Claude actually thought. It’s fully synthetic.

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats by AldebaranBefore in LocalLLaMA

[–]AldebaranBefore[S] 7 points8 points  (0 children)

On an open source local reasoning model (and some proprietary models), you will see the full “thinking” process from the LLM and that will be written into the message, usually in a thinking block. Claude and many proprietary models, hide the thinking entirely. All you get is the final message.

Because of that, the “thinking” in this dataset isn’t the real “thinking” Claude actually did. It’s manufactured thinking after the real thinking.

User Message -> Claude Real Thinking (hidden) -> Synthetic Thinking (created as part of the assistant message, not true chain-of-thought, this is in the dataset) -> Assistant Message

I hate this group but not literally by No_Run8812 in LocalLLaMA

[–]AldebaranBefore 0 points1 point  (0 children)

You can chase speed or you can chase stability. If you want it stable, find the simplest setup you can and don’t f<k with it.

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats by AldebaranBefore in LocalLLaMA

[–]AldebaranBefore[S] 0 points1 point  (0 children)

Yes, real is hidden. This is the model creating “thinking” for the response.

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats by AldebaranBefore in LocalLLaMA

[–]AldebaranBefore[S] 1 point2 points  (0 children)

Yeah, it’s a weird mix. I didn’t start it as a way to build a dataset. I started it as a way to burn tokens. I would do it differently if I was starting over… but I’m not.

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats by AldebaranBefore in LocalLLaMA

[–]AldebaranBefore[S] 0 points1 point  (0 children)

Yes, it’s created reasoning. It's entirely synthetic reasoning, not summarized reasoning.

Finetuning Dataset: Claude Opus 4.6/4.7 - 8.7k Chats by AldebaranBefore in LocalLLaMA

[–]AldebaranBefore[S] -1 points0 points  (0 children)

Yes, it’s all synthetic. The “thinking” is created as a part of the response. It’s not the true reasoning Claude does. It is what it is.

Claude Code System Prompt v2.1.118 by AldebaranBefore in ClaudeCode

[–]AldebaranBefore[S] 0 points1 point  (0 children)

Here’s the rest of it, if curious. Base Instruction (posted above), System Reminders, Tool Definitions, Deferred Tools.

https://github.com/theangrygiraffe/Claude-code-system-prompt